* 
Bi Ote we fourees 
rhe ae 


ote © 
cots 
nearness 


ORS eh Ok RE 
hs ere 


ener 
rons > 
PS Sa eS ane & OR: 


ee Saw 
Cf PGI Gots Be We EOL ATLA 5 


TF CT er 
PE aad 
rue aee 


a FE iver 


Gj 
err Sone 


te 
ah 


f 
wy 


aru 
eh tot ERM ES 


tpg ree ws mene 

CNP ETT PENT OEE DE. Hm 
Se it CEE OAL BAS 

PE TS GOL m 


ATER OME 
FAT) TOTEM He 


tire lise 
eed 


ete e ore 
Pie ¢ 


4h 4 be Oh 
bee oe 
a 


Cele en? 
ee ey 
enna rel 


we 
a 8 EET 


ee 

RORY eae 

ee eI y ealanane 
Tne Ba ae A Nee 


aaah A aN am BY A am oy ANT OO 


PP WOM OI 
ae re 


Ae OP, 


ae eh ee PE Fe ent 


OO EE Ae EE 
aa 


oft Panes 
ONT ORR OLE. Hee Oa 


peri ene ae) 
a and 


eee es 
FAG OE Te 
v—— 
Peet <A 
Pm Rg ere 
Se ty A ee 


pap ay UN a Lee 
pasa mh nh ard Co 
ee re 


a Oe 

a> ye Ooi ar ew 

PE meh FW Haars « 
ay a 

" 

AA 

mJ 


Be ree 


one 
ee One 


ete es Yee 
4 OE OTR Mp A Doe 


re a 
eo 


uw 
hin ee As me 
POND SE 
ere nee 
yer y 
a Aan onapd 


Ory 
ee 
SOP FETE S St A 8e > Fan A 
: ro 
04 oe 
Py ne 


sah OOD PANES 
evan eee 


wee 


OS ee 
Phe tee & 
oe 


aie) 
ree 


Pe ad 


EW th tpt Sh 

pares ply ye Prawn ete e @ 

he ORE eerie 
CD er ea 


ae Re ee 


Woe a nary ere nee NN 
aereyy 
eee een DY 
Oe 
eo 


NF ee ee 
Te TT WE DM Oo 
eh 2 


yeh eg ee a A a8 PAPNY AT 
ys se TRO body Id - 
an AS a wn pity 


oh a Ve OS 
Oe leapt ho 
Suis 


yw 
Wate Hoy 
Seu 


wn Ree pr Any 
we 
Spagna ‘ 
A irae (tine 
er % AD ey 0-9 8-00 
v Ae EF VRB re sean 


anew re 
1 ie eet ee DEAS 
TWA ee hen ay 
alata ease aot 
+ Arn 


isha lace 


Wem 


meee Taine 
Ve 


ROSEN ONNg TERS RAS Ae Oe 


ee 
% oe 5 eee 
ee ee 
ere oe 


ao 
A DT ea) 


eh LAO Loe 
Poth © ER Ae e 


* 
+ 


V * As 
SOK 


Signy nrnaty sens es) 
oe 
WS igcvea ed eu ctao ay het 
oe 
2 
oe 


es! 
earn 


era eh Ai 
ina pebsNit een) 


echo) ingot * 
Sema ptt UN can nln tty 
Vivenaly oe 


hy 
a ee yoy? 


re 


sinin ners Roast a 


Ly eh my eT Ak NN 


eee tote he 
AN Ab ee 


~ 
x 
en NY ey A REN 
be Or bt SHR Pehl SSDI. 
BPN Ma hd NT 1 


SA ra ie 
4A Py ha 


Fotahoeed saat bP 


dy te boy hata agsenening 


aes 
me ; 


et 


SO) ta gem ROR PU 
FE RR TD Oe 


SOUTHE SITY 


FONDREN LIBRARY 
SCIENCE 


of 


ae : 
ie oe — 
7 ee See 9 aa. 


Digitized by the Internet Archive 
in 2021 with funding from 
Kahle/Austin Foundation 


https://archive.org/details/advancedtheoryof0002unse 


THE ADVANCED 
WI HELOU AE LON PSE MI ES USS 


OTHER BOOKS OF INTEREST 


AN INTRODUCTION TO THE THEORY OF 
STATISTICS. 


Thirteenth Edition. Revised. Medium 8vo. Pp. xiii+ 570. With 
55 Diagrams and 4 Folding Plates. 24s. 


By G. UDNY YULE, C.B.E., M.A., F.R.S., 


Fellow of St. John’s College, and Formerly Reader in Statistics, Cambridge ; Hon. 
Vice-President, Royal Statistical Society, and 


M. G. KENDALL, M.A., 


Formerly Mathematical Scholar, St. John’s College, Cambridge, Fellow of the Royal 
Statistical Society. 


Contents : Notes on Notation and on Tables for Facilitating Statistical Work— 
Introduction—Theory of Attributes—Notation and Terminology—Consistence of 
Data—Association of Attributes—Partial Association—Manifold Classification— 
Frequency-Distributions—Averages and Other Measures of Location—Measures of 
Dispersion—Moments and Measures of Skewness and Kurtosis—Three Important 
Theoretical Distributions ; the Binomial, the Normal and the Poisson—Correlation— 
Normal Correlation—Further Theory of Correlation—Partial Correlation—Correla- 
tion: Illustrations and Practical Methods—Miscellaneous Theorems involving the 
Use of the Correlation Coefficient—Simple Curve Fitting—Preliminary Notions on 
Sampling—The Sampling of Attributes—Large Samples—The Sampling of Variables— 
Large Samples—The x? Distribution—The Sampling of Variables—Small Samples— 
Interpolation and Graduation—References—Tables—Answers to Exercises—Index. 
‘* This is the best book on the theory of statistics that was ever written. . . . Weconclude by expressing 


the hope that the book in its new form will have a very wide circulation.’'—Nature. 
‘* THE book on statistical method.’’—vide Bulletin of the American Mathematical Society. 


BIOMATHEMATICS. 


Second Edition. Enlarged and Re-set. Large Crown 8vo. 
Pp. xviii + 480. With many worked numerical examples, and 
164 Diagrams. 28s. 


Being the Principles of Mathematics for Students of Biological Science. 


By W. M. FELDMAN, M.D., B.S.(Lond.), F.R.S.(Edin.), 
F.R.C.S. 


Contents : Introductory—Logarithms—A Few Points in Algebra—A Few Points 
in Elementary Trigonometry—A Few Points in Elementary Mensuration—Series— 
The Simple and Compound Interest Laws in Nature—Functions and their Graphical 
Representation—Nomography—Differentials and Differential Coefficients—Maxima 
and Minima—Estimation of Errors of Observation—Successive Differentiation— 
Integral Calculus—Biochemical Applications of Integration—Thermodynamic Con- 
siderations and their Biological Applications—Use of Integral Calculus in Animal 
Mechanics—Use of the Integral Calculus for Determining Lengths, Areas and Volumes, 
also Centres of Gravity and Moments of Inertia—Special Methods of Integration— 
Differential Equations—Fourier’s Series—Mathematical Analysis applied to the 
Co-ordination of Experimental Results—Biometry—Appendix—Index, 


“An excellent introduction, and worthy of great praise.’’"—Edin. Med. Jour. 


Prices Net, Postage Extra 


CHARLES GRIFFIN & CO. LTD., 42, Drury Lane, London, W.C.2 


THE ADVANCED 
in OUN GE MOle SFR S UGS 


by 
MAURICE G. KENDALL, M.A. 


An Honorary Secretary of the Royal Statistical Society 
Statistician to the Chamber of Shipping of the United Kingdom 
Fellow of the Institute of Mathematical Statistics 


VOLUME Il 


With 30 Illustrations and 52 Tables 


SECOND EDITION 


LONDON 
CHARLES GRIFFIN & COMPANY LIMITED 
42 DRURY LANE 
1948 


[All Rights Reserved] 


TO 
PETER. anv “PAUSE 


Printed in Great Britain 
by Butler & Tanner Limited, Frome 


PREFACE TO FIRST EDITION 
Of V OLVNT, Li 


This volume falls into five sections. The first, comprising chapters 17 to 20, deals 
with Estimation. The second, comprising chapters 21, 23, 24 and 26 to 28, covers the 
Theory of Statistical Tests, including the Analysis of Variance and Multivariate Analysis. 
The third, consisting of chapter 22, deals with Regression Analysis and completes the 
account of statistical relationship begun in chapters 13 to 16 of Volume I. In the fourth, 
chapter 25, I have tried to give an introductory account of the reaction of theoretical 
considerations on the Design of Statistical Inquiries. Finally, the fifth, comprising chapters 
29 and 30, deals with the Analysis of Time-Series. 

The literature of statistical theory is now so vast that it seemed worth while devoting 
considerable space to a bibliography, which is given in Appendix B. Although it is far 
from complete, I hope that it will serve its purpose in guiding the student to the main 
sources. 

The chief problem in the writing of this volume arose in connection with the logic of 
statistical inference. Whenever possible I have kept the treatment objective. It is, 
I consider, unfair in a book of this kind not to present all sides of a case, particularly when 
there is so much disagreement among the authorities. Some day I hope to show that 
this disagreement is more apparent than real, and that all the existing theories of inference 
in probability differ essentially only in matters of taste in the choice of postulates. But 
this book is not the place for such work, and for the present I am content to state the 
position and to leave the reader to exeicise his own choice. 

The difficulty became most acute in dealing with confidence intervals and fiducial 
inference, where two approaches which at first sight appear identical can lead to different 
results. Rather than try to reconcile them I have written a separate chapter on each. 
Professor E. 8. Pearson was kind enough to read the manuscript of chapter 19 and Professor 
R. A. Fisher that of chapter 20, so that I think their respective views are, at any rate, not 
misrepresented. I am very grateful to them both for their help in this connection. 

My thanks are also due to Mr. P. A. Moran and Mr. A. J. H. Morrell, who cheerfully 
undertook to help with the proof reading and to whose painstaking scrutiny I owe the 
removal of a number of obscurities and errors. I shall be grateful to any reader who 
detects and notifies me of any further slips which have evaded us. Once again I have also 
to thank the publishers and the printers for the trouble they have taken in the production 
of the finished work. 

LonpDoN, 
April, 1946. 


202599 


PREFACE TO SECOND EDITION 
OF VOLUME [ai 


A few misprints have been corrected, but otherwise this edition is the same as its 
predecessor. The exhaustion of the first edition in little more than a year has been a 
very gratifying sign that the book is filling a need both at home and abroad. 


Lonpon, 
August, 1947. 


CHAP, 


lng 
18. 
19. 
20. 
Pall 
22. 
23. 
24. 
25. 
ZO, 
ei. 
28. 
29. 
30. 


TABLE OF CONTENTS 


Estimation : Likelihood ... 
Estimation : Miscellaneous Methods 
Confidence Intervals 

Fiducial Inference 

Some Common Tests of Significance 
Regression ... 

The Analysis of Variance—(1) ... 
The Analysis of Variance—(2) ... 
The Design of Sampling Inquiries 


General Theory of Significance-Tests—(1) 


General Theory of Significance-Tests—(2) 


Multivariate Analysis 
Time-Series—(1) 


Time-Series—(2) 


Appenpix A: AppENDA TO VoLuUME I 


Appenpix B: BrIsLIoGRAPHY 


InpDEX TO VoLUME II 


PAGES 


1-49 
50-61 
62-84 
85-95 

96-140 
141-174 
175-217 
218-246 
247-268 
269-306 
307-327 
328-362 
363-395 
396-439 


440-441 
442-503 


504-521 


CHAPTER 17 
ESTIMATION : LIKELIHOOD 


The Problem 


17.1. On several occasions in previous chapters we have encountered the problem 
of estimating from a sample the values of the parameters of the parent population. We 
have hitherto dealt on somewhat intuitive lines with such questions as arose—for example, 
in the theory of large samples we have taken the means and moments of the sample to be 
satisfactory estimates of the corresponding means and moments in the parent. 

We now proceed to study this branch of the subject in more detail. In the earlier 
part of the present chapter we shall examine the sort of criteria which are required of 
a “good” estimate and discuss the question whether there exist ‘‘ best’ estimates in 
any acceptable sense of the term. In the remainder of the chapter and in Chapter 18 
we shall consider various methods of obtaining estimates with the required properties, 
In Chapters 19 and 20 we shall look at the same problem from a rather different point of 
view and discuss the theories of confidence intervals and fiducial limits. 


17.2. It will be evident that if a sample is not random and nothing precise is known 
about the nature of the bias operating when it was chosen, very little can be inferred from 
it about the parent population. Certain conclusions of a trivial kind are sometimes pos- 
sible—for instance, if we take ten turnips from a pile of 100 and find that they weigh ten 
pounds altogether, the mean weight of turnips in the pile must be greater than one-tenth of 
a pound ; but such information is rarely of value, and estimation based on biassed samples 
remains very much a matter of individual opinion and cannot be reduced to exact and 
objective terms. We shall therefore confine our attention to random samples only. Our 
general problem, in its simplest terms, is then to estimate the value of a parameter in the 
parent from the information given by the sample. In the first instance we consider 
the case when only one parameter is to be estimated. The case of several parameters 
will he discussed later. 


17.3. Let us in the first place consider what we mean by “estimation”. We know, 
or assume as a working hypothesis, that the parent population is distributed in a form 
which would be completely determinate if we knew the value of some parameter 6. We 
are given a sample of values x, . . . x,. We require to determine, with the aid of the 
x’s, a number which can be taken to be the value of 6, or a range of numbers which can 
be taken to include that value. 

Now a single sample, considered by itself, may be rather improbable, and any estimate 
based on it may therefore differ considerably from the true value of 6. It appears, 
therefore, that we cannot expect to find any method of estimation which can be guaran- 
teed to give us a close estimate of 6 on every occasion and for every sample. We must 
content ourselves with formulating a rule which will give good results ‘“‘ in the long run ” 
or “on the average”, or which has “a high probability of success ’’—phrases which 
express the fundamental fact that we have to regard our method of estimation as generating 
a population of estimates and to assess its merits according to the properties of this 
population. 

A.S.—II 1 B 


2 ESTIMATION: LIKELIHOOD 


17.4. It will clarify our ideas considerably if we draw a distinction between the 
method or rule of estimation, which, following Pitman, we shall call an Estimator, and the 
value to which it gives rise in particular cases, the Estimate. The distinction is the same 
as that between a function f (x), regarded as defined for a range of the variable x, and the 
particular value which the function assumes, say f (a), for a specified value of x equal to a. 
Our problem is not to find estimates, but to find Estimators. We do not reject a method 
because it gives a bad result in a particular case (in the sense that the estimate differs 
materially from the true value). We should only reject it if it gave bad results in the long 
run, that is to say, if the population of possible values of the estimator were seriously 
discrepant with the value of 6. The merit of the estimator is judged by the population 
of estimates to which it gives rise. It is itself a random variable and has a distribution 
to which we shall frequently have occasion to refer. 


17.5. In the theory of large samples we have often taken as an estimator of a para- 
meter 6 a statistic ¢ calculated from the sample in exactly the same way as @ is calculated 
from the population, e.g. the sample-mean is taken as an estimate of the parent mean. 
Let us examine how this procedure can be justified. Consider the case when the parent 
population is 

dF = exp {— 4 (x — 6)*} da, —o<rxr<com. « “(hieh) 


Requiring an estimator for the parent mean 6, we take 


1 
t= — OF uy). rer 


= Vn Be ee 
ci Van) exp { . (t — 6) \ dt, . ‘ : « (eS) 
that is to say, ¢ is distributed normally about 6 with variance 1/n. We notice two things 
about this distribution: (a) it has a mean (and median and mode) at the true value 6, 
and (b) as ” increases, the scatter of possible values of t about 6 becomes smaller, so that 
the probability that a given ¢t differs by more than a fixed amount from 6 decreases. We 
may say that the accuracy of the estimator increases as ” increases, or simply with n. 


The distribution of ¢ is 


‘ 


17.6. Generally, it will be clear that the phrase “ accuracy increasing with 7 ’”’ has 
a definite meaning whenever the sampling distribution of ¢ has a variance which decreases 
with 1/n and a central value which is either identical with 6 or differs from it by a quantity 
which also decreases with 1/n. Many of the estimators with which we are commonly 
concerned are of this type, but there are exceptions. Consider, for example, the Cauchy 
population 


dF = 


1 dx 

5 —-a<“r<comw . : F Ne 
mz 1+ (2 — 6)? = .. (17.4) 
The mean (assuming that we conventionally agree that it exists) is atx =6. But if we 
try to estimate 6 by the mean-statistic ¢t we have, for the distribution of t, 


1 dt 
a ——, = <t< : ° : 
m1 + eae SS Nee) (17.5) 
(Cf. Example 10.1, vol. I, pp. 233-4.) In this case the distribution of ¢ is the same 
as that of any single value of the sample, and does not increase in accuracy as 7 increases. 


CONSISTENCE 3 


Consistence 


17.7. The property of possessing increasing accuracy is evidently a very desirable 
one; and indeed, if the variance of the sampling distribution decreases with increasing 
n it is necessary that its central value should tend to 0, for otherwise the estimator would 
have values differing systematically from the true value and would be useless, not to say 
dangerous. We therefore formulate our first criterion for a suitable estimator as follows :— 

An estimator ¢,, computed from a sample of » values, will be said to be a consistent 
estimator of 6 if, for any positive « and 7, however small, there is some N such that the 
probability that 


lt, ~O|<e. : : ; . (17.6) 
is greater than 1 — 7 for alla > N. In the notation of the theory of probaLility, 
P {|t, —0|<e}>1-—-7, > Ve : eato(hit) 


The definition bears an obvious analogy to the definition of convergence in the mathe- 
matical sense. Given any fixed small quantity « we can find a large enough sample number 
such that for all samples over that size the probability that ¢ differs from the true value 
by more than ¢ is as near zero as we please. t¢, is said to converge in probability to #0. Thus 
t is a consistent estimate of 0 if it converges to 6 in probability. 


Example 17.1 


The sample mean is a consistent estimator of the parameter 6 in the population (17.1). 
This we have already established in general argument, but more formally the proof would 
proceed as follows :— 

Suppose we are given «. From (17.3) we see that (¢ — 6) /n is distributed normally 
about zero with unit variance. Thus the probability that | (¢ — 0) W/n| <e 1/n is the 
value of the normal integral between limits + ¢,/n. Given any positive 7, we can 
always take n large enough for this quantity to be greaier than 1 — 7 and it will continue 
to be so for any larger n. N may therefore be determined and the inequality (17.7) is 
satisfied. 


Example 17.2 


Suppose we have a statistic t, whose mean value differs from 6 by order n~1, whose 
variance v, is of order n~+ and which tends to normality as mn increases. Clearly 
(t, — 9)/+/v, will then tend to zero in probability and ¢, will be consistent. This covers 
a great many statistics encountered in practice. 


Unbiassed Estimators 
17.8. The property of consistence is a limiting property, that is to say, it concerns 

the behaviour of an estimator as the sample number tends to infinity. It requires nothing 
of the behaviour for finite n, and if there exists one consistent estimator ¢,, we may construct 
infinitely many others; e.g. 

n—-a 

n—b” 
is also consistent. We have seen that in some circumstances a consistent estimator of the 
mean is the sample mean 


a Xj. ° ° ° . ° ° (17.8) 


4 ESTIMATION: LIKELIHOOD 


But so is 
; d 
c= PON Tae 5 : 5 z 17.9 
agian a ima (17.9) 


Why do we prefer one to the other? Intuitively it seems absurd to divide the sum of 
nm quantities by anything other than their number m. We shall see in a moment, however, 
that intuition is not a very reliable guide on such matters. There are reasons for preferring 


1 Te 

vA v)? e e e e . 17.10 

—s pe (1 — #) (17.10) 
ia 

to ->, (ee. |. ee 
p= 


as an estimator of the parent variance, notwithstanding that the latter is the sample 
variance. 


17.9. Consider the sampling distribution of an estimator ¢. If the estimator is 
consistent, its distribution must, for large samples, have a central value in the neighbour- 
hood of 6. We may choose among the field of consistent estimators by requiring that 
@ shall be equated to this central value not merely for large, but for all samples. Whether 
we choose as the appropriate central value the mean, the median or the mode is to some 
extent a matter of taste. We shall consider below what follows if we select the mode 
(which gives us the maximum likelihood estimators). For the present we discuss the mean. 

If we require that for all n the mean value of ¢ shall be 6, we define what is known as 
an unbiassed estimator : 

E@=0. . . . «  « Giga) 
This is an unfortunate word, like so many in statistics. There is nothing except con- 
venience to exalt the arithmetic mean above other measures of location as a criterion of 
bias. We might equally well have chosen the mode as determining the “ unbiassed ” 
estimator, in which case the mean estimator would be “ biassed ’’ whenever it gave a dif- 
ferent result. Since the use of “ unbiassed ” in connection with the mean is fairly wide- 
spread, however, we shall continue to use it.* 


Example 17.3 
Since E ‘; a | = ae {Hi (x)} 


| 
as nn A els 


the mean-statistic is an unbiassed estimator of the parent mean whenever the latter exists. 
But the sample-variance is not an unbiassed estimator of the parent variance. We have 


E{S(2—#)=E 12 E -52@/t. 


n — 1 Ht ‘ 
= - Da) se E (am), Pez 
= (n — 1) wg — (n — 1) py}? 
= (n — 1) pe 


* The word has already occurred in vol. I, p. 200, in this sense. It may be spelt with either one 
or two s’s. My usage, I am afraid, is not consistent, but in this volume I use two. 


EFFICIENT ESTIMATORS 5 


I , : 
{tz On the other hand, an unbiassed estimator 


Thus = » (« — £)? has a mean value ii 


is given by 
ee 
n—1 
and for this reason it is sometimes preferred to the sample variance. There are other 
reasons which will appear when we come to study the analysis of variance. 


Efficient Estimators 


17.10. In general there will exist more than one consistent estimator of a parameter, 
even if we confine ourselves only to unbiassed estimators. Consider once again the esti- 
mation of the mean of a normal population with known variance. The sample mean is 
consistent and unbiassed. We will now prove that the same is true of the median. 

Consideration of symmetry is enough to show that the median is an unbiassed estimate 
of the parent mean, which is, of course, the same as the parent median. For large n the 
distribution of the median tends to the normal form (cf. Example 9.7, vol. I, p. 213), 


dF «x exp {— 2nf? (1x —6)?} dx. , ; alle} 


where f, is the median ordinate of the parent, in our present case 1/4/(27) = 0:3989. The 
variance tends to zero and the estimator is consistent. Its variance is 2/2n. 


17.11. We are therefore at liberty to seek for further criteria to choose between 
estimators with the common property of consistence. Such a criterion arises naturally 
if we consider the sampling variances of the estimators. Generally speaking, the estimator 
with the smaller variance will be grouped more closely round the value 6 ; this will certainly 
be so for distributions of the normal type. An estimator with a smaller variance will 
therefore deviate less, on the average, from the true value than one with a larger variance. 
Hence we may reasonably regard it as better or more efficient. 

If, of two consistent estimators ¢t, and f,, we have var ¢, < var t, for all n, then f, is 
more efficient than ¢, for all sample sizes. It is possible to have var t, < var ft, for some 
ranges of n and vart, >vart, for others, in which case the estimators are more or less 
efficient in different ranges. 

In the case of mean and median we have, for any n, 


2 
var (mean) = ov A - - . - (17.14) 


and for large n 


2 
var (median) = a : 4 . 4 ~ (LTS) 


where o? is the parent variance. Since 2/2 = 1:57 > 1 the mean is more efficient than 
the median for large at least. For small we have to work out the variance of the median, 
The following values may be obtained from those given in Table XXIII of Tables for 
Statisticians and Biometricians, Part II :— 2 
n 2 3 4 5 

var (median) 1-00 1:35 1-19 L44 
It appears that the mean is always more efficient than the median in estimating the para- 
meter @ for the normal distribution (17.1). 


Je ESTIMATION: LIKELIHOOD 


Example 17.4 


For the Cauchy distribution 


ip 2 dic 
zm 1 + (x — 6)? 
we have already seen that the sample mean is not a consistent estimator. However, for 


the median in large samples we have, since the median ordinate is 1/z, 


: cs i 
var (median) = —. 
4n 


—~ 0 <x%<@w 


It is seen that the median is consistent, and although direct comparison with the mean 
is not possible because the latter does not possess a sampling variance, the median is evi- 
dently a better estimator for 9 than the mean. This provides an interesting contrast with 
the case of the normal parent, particularly in view of the similarity of the parent frequency- 
distributions. 


17.12. In some cases, as we shall see below, there exist consistent estimators whose 
sampling variance for large samples is less than that of any other such estimator. We 
shall call such estimators most-efficient. When they exist they provide a standard of 
measurement of efficiency. In fact, if tf, has variance v, and the most-efficient estimator 
t, has variance v,, the efficiency EH of t, is defined as 

Dv 
E=--. : c 5 5 5 ale 
(17.16) 
It will be seen later that in normal samples the mean is a most-efficient estimator, so that 
the efficiency of the median for such samples is 


2 
Lm 
It WW 


17.13. If we have a sample of 100 members the variance of the median (assuming 
normality) will be about the same as that of the mean in only 64 members. Thus, if 
sampling variance be accepted as a criterion of accuracy of estimation, the use of the median 
instead of the mean sacrifices about 36 observations in 100. It is not possible to economise 
by using a different estimator than the mean. 

Other things being equal, the estimator with the greater efficiency is undoubtedly 
the one to use. But sometimes other things are not equal. It may, and does, happen 
that a most-efficient estimate derived from ¢, is more troublesome to calculate than an 
alternative ¢,. The extra labour involved in calculation may be greater than the saving 
in dealing with a smaller sample number, particularly if there are plenty of further 
observations to hand. 


Example 17.5 


Consider the estimation of the standard deviation of a normal population with variance 
o? and unknown mean. ‘Two possible estimators are the standard deviation of the sample 
(or the square-root of 2 (a — #)?/(n — 1) if it is desired to use an unbiassed estimator) 
and the mean deviation of the sample multiplied by »/(7/2) (cf. 5.20). The latter igs 
easier to calculate, as a rule, and if we have plenty of observations (as, for example, if we 
are finding the standard deviation of a set of barometric records and the addition of further 


SUFFICIENT ESTIMATORS 7 


members to the sample is merely a matter of turning up more records) it may be worth 
while estimating from the mean-deviation rather than from the standard deviation. 
In normal samples the variance of the mean-deviation is (9.13)— 


ig (G+ V {nn — 2)} —n + aresin 15) ~ (1 ~ =) . (17.17) 


The variance of the estimator from the mean deviation is then approximately 


= (45>) a a ee aS 


n vn 


Now the variance of the standard deviation is (9.22) o?/2n, and we shall see later that it 
is a most-efficient estimator. Thus the efficiency of the first estimator is 


2 2 = ; 
B= Sea ay ft 0-876. 
2n n B. a Y 


The accuracy of the estimate from the mean deviation of a sample of 1000 is then about 
the same as that from the standard deviation of a sample of 876. If it is easier to calculate 
the m.d. of 1000 observations than the s.d. of 876 and there is no shortage of observations, 
it may be more convenient to use the former. 

It has to be remembered, nevertheless, that in adopting such a procedure we are 
deliberately wasting information. By taking greater pains we could improve the efficiency 
of our estimate from 0-876 to unity, or by about 14 per cent. of the former value. 


Sufficient Estimators 

17.14. The comparison of the efficiencies of two estimators, as measured by their 
variances, may be made for any n, but the absolute efficiency as defined in 17.12 by relation 
to a most-efficient estimator is in the main a limiting property. We shall see below (17.36) 
that the definition may be extended to small samples and to non-normal variation, but 
most-efficient estimators for finite » do not exist so frequently in statistical practice 
as in the limiting case of large samples. Sometimes, however, there are estimators which 
tay be regarded as the “best’’ for samples of any size, and we proceed to consider 


them. 
Before doing so, we prove that, in the limit, all most-efficient estimators tend to 


equivalence. 
More precisely, if two most-efficient estimators ¢, and ¢, tend in the limit to be dis- 


tributed in the bivariate form 


1 
dF o exp [> 201 — p*) {(é, — 6)? — 2p (t, — 6) (4, — 9) + (& — 3] dt, dt,, . (17.19) 
then the correlation p = 1. Here v is the variance of each estimator. 


Consider the estimator 
Uy = $ (t, + 4). 


Clearly uw, is consistent since t, and f, are both so. Putting 
Us = 3 (tr — #2) 


we have, for the joint distribution of u, and w,, 


eo =p) Gy = 8) 2 1 4Q) 7] du, du; . (17.20) 


dF «x exp ers 


FONDREN LIBRARY 
Seuthern Methodist Universi, 
DALLAS. TEXAS 


8 ESTIMATION: LIKELIHOOD 


Thus u, is distributed independently of u, and @ and we have 
OAL =op') Vee . 


Valet =a a 
Se) = » (17.21) 
Now #, is a most-efficient estimator and hence 
] 
7 Py = var u; > Var = v 
ae i| 
giving Si. ot See es, 


But p cannot be greater than unity and hence p = 1, which proves the theorem. 


17.15. Consider once again the estimation of 9 in the normal population (17.1). 
The joint distribution of the sample is given by 


il n 

fi = exp { —4 ) (x; — ays} dGiies are Oe 3 mld. 2a) 
(220)? j=1 

We have the familiar result 


n 


D, (% — 9)? =F (a — a)? + nz — 8), 
and hence os 


Ff = 


— exp {56 = ay} exp {— 42 (a —#)*} dx, .. . da, . (17.24) 

(27)? 

Thus the frequency function of the distribution of x’s (which is equivalent to the likelihood 

function) can be factorised into two parts, one depending on < and 6, the other depending 
on the x’s but not on 6. 

The quantity < is then said to be a sufficient estimator of 9; and generally, if the 

likelihood function is expressible in the form (as a product of two frequency functions)— 


L (ae oe 8 6 Xn» 0) = Dp (t, 0) L. (21, a. valiees os . . (17.23) 
where L, does not contain the x’s otherwise than in the form ¢ and L, is independent of 0, 
t is said to be a sufficient estimator of 6. 


17.16. As so defined, a sufficient estimator, if it exists at all, is unique except that 
if t obeys the relation (17.25) any function of ¢ will obviously also obey the same relation. 
From all such functions we must evidently choose one which gives a consistent estimator 
and can sometimes, as in the example of the previous section, find the estimator which is 
unbiassed. Apart from such ambiguities, which ofter no difficulties in practice, the property 
of uniqueness holds. For if ¢; and ¢, were two different sufficient statistics, not functionally 
related, we should have— 

DL, (4, 0) Ly @i ss 2) = 1 0 8) Meee 
and hence 
L, (t1, 9) a M, 
M, (t, 6) Ly 
Since the expression on the right does not contain 6, Z, must be a factor of M, and more- 
over the quotient must be a constant; for if it were a function of the xz’s that function 
would have been assimilated to DZ, or MU,. 


: (17.26) 


SUFFICIENT ESTIMATORS 9 


Hence 
1 (4, 6) = k M, (é2, 6), 
and this cannot be so unless ¢, and ¢, are functionally related. 


17.17. ‘The fundamental property of sufficient estimators derives from the following 
theorem :— 

If ¢, is sufficient and ¢, is any other estimator of @ (not a function of ¢,) the joint dis- 
tribution of ¢, and t, may be put in the form 


aF = f,.(t, OUP dige <b se 27) 


where f, does not contain 6. Conversely, if (17.27) holds for every f, then t¢, is sufficient. 

Before proving this result let us notice its importance. From (17.27) it follows that 
for any given ¢, the distribution of t, is equal to f, (¢., t,) dt,, ie. is independent of 6. Con- 
sequently, if we know t,, the probability of any range values of ¢, is the same for all 0. 
The distribution of ¢, given ¢,, therefore, can throw no light whatever on 6. Thus, a know- 
ledge of t, gives all the information that the sample can supply about @ and no other 
estimator can add anything to it. We are clearly justified in such circumstances in 
describing a sufficient estimator as the “ best ”’. 

Now as to the theorem itself. The direct part is easily proved. In fact, we have from 
(17.25)— 

Pere ead, 2 5s OL at, 0) Ls (Xa... . X,) Ay. seme, 
Make the transformation 

Yi = t, (21, ae Ln) 


Yo — te (23. Rea L) 
Y3 = X3 ° . ° e e (17.28) 
Yn = Ly 


The element of frequency becomes 

Ae ONE ate) a 3 
where the t’s and z’s are to be expressed in terms of the y’s. We have excluded the case 
when f, is functionally related to ¢,, and hence the Jacobian 0 (%,, #2) /d(t,, t,) does not 
vanish identically. The frequency element of y, and y, is then obtained from (17.29) by 
integrating out the other variables. Since y, and y, are equal respectively to ¢, and f¢, 
this process will leave unchanged the function J, (t,, 0) and reduce the other part to a 
function of ¢, and f., say f, (é,, t2). Writing f, for L, we then have 


dF =fi (t, 0 eu t,, t 2) dt, dts, 


1) Pa a aren) Se oe . (17.29) 


as stated in the theorem. 
The converse is a little more difficult. Let ¢, be sufficient and make the transformation 
Y1 = ty, Y2 = %, etc. The joint distribution of sample values becomes 
, | o¢ 
10) 1 Sr eat) Yeo Died (Sh) gene aes Yn) | a. 
Since ¢, is independent of 0, so is 0t,/@x,. Hence, if the distribution of ¢, is f (¢,) dt:, L’ may 
be written 


1720) 


f (ts) L" (hs Yu + -' + Yn): : . : 5 (lial) 
and the converse will be established if we can show that L” does not contain 6. This we 


10 ESTIMATION: LIKELIHOOD 


do by demonstrating that if there are values y, . . . y;, for which L” assumes different 
values for different values of 6 then the joint distribution of ¢, and ¢, cannot be independent 
of 6, which contradicts our hypothesis. 

Suppose, then, that for two values of 6, say 6, and 4, 


L (t:; yas «< Yale, = L" (ts Ya en) iy 32) 
where « is not zero. Consider a new statistic ¢; defined by 


B= wy)? - 6 eee (17,83) 
j=2 


Assuming that L” is continuous in the y’s, we may determine a value of f,, say t,, such that 
L(t, Yas ape is Ya Yn ye : « (17.84) 
everywhere inside the range of values bounded by 
t2-= Z(y —y')* 
Then for any fixed ¢, the total frequency inside this range is obtained by integrating L’ 
over the appropriate values, and we shall find, in virtue of (17.34), 


te > So ; : . . « (17.35) 
the f’s referring to total frequencies. 
But if the joint distribution of ¢t, and #, is 
dF = h (t1, te)o dt, dt, 
we have for the frequencies f, 


ts 
So, = IE h (t1, te)o, dts 


t, 
ae j, h (thy ta)o, ty 
and hence 


f 
| ’ {h (1, ts)o, — h(t, te)o, } dt, > 0, 
0 


so that the joint distribution cannot be independent of 6. 

The above demonstration relates to the case when the frequency functions are con- 
tinuous. In the discontinuous case the argument simplifies and we leave it to the reader 
to supply the proof. 


17.18. We now prove an important further result to the effect that a sufficient 
estimator is most-efficient, provided that a most-efficient estimator exists. We assume 
that the joint distribution of the sufficient estimator ¢, and any other estimator ¢, tends 
to normality for large n, say in the form 


_ 1 f= 9)? _ 2h —9) (2 = 9) 5 (eh — 9) 
dF o exp | ei) { # Micke) se ra \] dt, dt, . (17.36) 


where v, and v, are the variances of ¢, and ¢, respectively. Since ¢, is sufficient, the dis- 
tribution of ¢, given ¢, does not contain #6. Now the distribution of ¢, is 


= 2 
dF x exp i= po} di; . : : « (Lesa) 


1 


SUFFICIENT ESTIMATORS 11 


and hence that of ¢, given ¢, is 
1 (é: — 9)? 2p (t, — 6) (é, — 6) oom (¢, — 0)? 
dF x ex [3 { z = i es 2 ae) dt 
2 2 — p?) Vy V (v1 V2) ne V2 we V1 : 
which reduces to 


a cc exp | — sy | SN) at . . (17.38) 


If this is not to involve 9 we must have 


a J = /EH, where E is the efficiency of ¢,. « (17.39) 
2 


Since p <1 it follows that v, <v., 1.e. t; has a smaller variance than any other estimator. 
Consequently, if there exists a most-efficient statistic, ¢, itself is most-efficient. 


17.19. The criterion of sufficiency is not a limiting property. A sufficient estimator 
is best for any sample size since it gives all the information about 0 that the sample can 
give; and it is most-efficient for large samples. If we could always find a sufficient 
estimator our problem would be solved, but unfortunately sufficiency is the exception 
rather than the rule. 


Example 17.6 
The frequency element of a sample of » from the population 


= ees eee 1) 
1 = eR exp { $ me } dx 
can be put in the form 
n~-1 A 
__vn __ 1% (@ — m)? me ~ ei n—3 dz dg? 
fo 54/2) exp al e “8 & ds 


ae p(t —} 
feo") 2d ( 5 ) 
(Cf. Example 10.5, vol. I, p. 238.) 
If we know a, then, as we have already seen, Z is sufficient for m. But if we know 
m, § is not sufficient for co. In fact, the factorisation in the above equation requires the 
appearance of o in the element relating to %, and we cannot separate a factor containing 


s and o alone or the remaining variables alone. 
This is what we might expect. If we know the real mean m there is little point in 


preferring the sample variance 


st = E(x — 4)! 


to the second moment 


as an estimator of the parent variance. The distribution of s’ is given by 


n 
n2 ns™ 


pe eee Cre (5)? ds’? 
ae n 
(207)2 r(5) 


12 ESTIMATION: LIKELIHOOD 


and this embodies the whole of the frequency element of the sample, apart from differentials 
in the other variables. Thus s’ is sufficient for o. 


17.20. This completes the first stage of our inquiry. The criteria of consistence, 
efficiency and sufficiency provide standards which we shall look for in “ good ”’ estimators. 
Of themselves, however, they do not provide any systematic way of deriving estimators 
which obey them. We shall now consider various methods which have been proposed for 
providing estimators and examine how far they conform to our criteria. The most 
important method is that of maximum likelihood, which will occupy the remainder of this 
chapter. In the next chapter we shall consider four others, the method of minimum 
variance, the method of minimum y?, the method of least squares, and the method of 


inverse probability. 


Maximum LInkelihood 
17.21. If the frequency function of the parent population is f (x, 9), the likelihood 
function of a sample of n is, by definition, 


T= (ee ee ee ae 0): : é . (17.40) 

The Principle of Maximum (or Maximal) Likelihood then states that if there exists a statistic 
t=t(a,, . . . %,) which maximises LZ for variations of 6, then ¢ is to be taken as an 
estimator of 9. In short, ¢ is the solution (if any) of 

oL on 

—=0 - = : 2 : . : : 

ay ae (17.41) 
Since L is positive, the first equation is equivalent to 

lol a 

E76 39 ee = : - : : - (17.42) 


a form which is frequently more convenient. 

There is one small point to notice here. In our usual convention, if a frequency 
function has a finite range, we regard it as defined from — o to + oo but as zero outside 
that range. In this chapter we shall occasionally meet the reciprocal of f, which is undefined 
for zero f. Unless the contrary is specified we shall suppose that where f is zero 1/f is also 
to be regarded as zero. This will enable us to continue to regard the range as infinite, but 
some care is necessary where f is assumed everywhere continuous, for discontinuities may 
appear in f and 1/f at the terminals of the finite range. The point becomes important 
when we try to make certain existence theorems rigorous. 


17.22. In sections 7.27 to 7.31 we touched on the principle of maximum likelihood 
from the point of view of statistical logic. We pointed out that its adoption required a 
new postulate in the theory of inference, but referred to the fact that the principle was 
recommended by the statistical properties of the estimators to which it leads. We now 
proceed to prove a series of theorems about these estimators, from which it will be seen 
that the posterior recommendation, so to speak, is very strong. In fact, maximum 
likelihood estimators are consistent, tend to normality for large n, have minimum variance 
in the limit at least, and provide sufficient statistics where such exist. 


17.23. The reader may feel convinced intuitively that maximum likelihood estimators 


MAXIMUM LIKELIHOOD 13 


are consistent, in which case he can pass to the next section. We shall now prove the 
result formally. 

(a) If the frequency function f (x, 6) is continuous in x throughout its range, and 

(b) if f (w, 6) is continuous and monotonic in @ in some 6-interval containing the true 

value of 6, say 6, and for all 2 in some wz-interval, 
then the maximum likelihood estimator of 6, say ¢t, is consistent. 

Our proof will also cover the case of discontinuous variates which can be reduced to 
the continuous case by replacing each value by an interval in which the frequency is 
uniformly distributed. 

We first eliminate an inconvenience due to the infinitude of the range. In fact, if the 
range is infinite we make the variate transformation = tan y. The conditions (a) and (b) 
remain true of y, and the maximum likelihood estimator in x transforms to pall in y. We 
may therefore take the range as finite. 

The next step is to reduce the case to one of grouped frequencies by dividing the range 
into m intervals, the width of the jth interval being J. (We shall decide on the actual 
values of the l’s below.) Writing 


fj = [iF (, G\day amnesia I 7-4c) 


we have, in virtue of the continuity of f in x, that f,/l, differs as little as we please from 
f (x;, 0). Then if L’ is the likelihood of the grouped data, proportional to 


(f)" (By a (z2)" . ares 


m 
where 7, is the number of observations in the jth interval, we have, except for constants, 


log L’ = )' ny log f; — S' ny; log LL. : ° -« (17.45) 
j=1 j- 


j=1 


and this will differ arbitrarily little from the logarithm of the true likelihood 


logL= S'logf(z,6),. +. + «  « (17-46) 
j=l 


provided that we take m large enough and the /’s in consequence small enough. 

Hence we see that if ¢ is the estimator which maximises L and t’ that which maximises 
L’, in virtue of hypothesis (b) that L and L’ are continuous in 6, ¢ and ?¢’ will differ as little 
as we please for any given values of the x’s and that ante We may therefore prove 
our theorem for the finite number of variables n; and infer its truth for the continuous 
case by proceeding to the limit. 

In different samples the n; will vary, subject only to the condition that & (n;) =n. 
Let us choose the ranges /, such that f; (99) = 1/m for all j, that is to say, such that the 
frequencies in all intervals are equal when 6 takes its true value @). Consider the likelihood 
function 


K= n, log z;, F ‘ ‘ ; « (V7eea) 


where the z’s are subject only to the condition 
(2) = le: ° e e e e e (17.48) 


14 ESTIMATION: LIKELIHOOD 


We consider three values of K defined by particular values of the 2’s. 
(a) When z; = n,/n, K is a maximum, say Kp. For we have 


dK = i bz, 
5 
and hence 
N, Ne __ 2 (n) 


=n. 


ren 

(6) When z; = f; (0.) = 1/m, K is, say, Ky. 

(c) When o estimator t’ assumes the value, say, f; corresponding to the n,’s, and 
hence z; = f; (t)), AK is a maximum, say Kz, among the particular set of values of 6 for 
which z; = f; (0) ; for this is our definition of ?’. 

We have at once that 

Kp>Kz>Ky. : . : . . (17.49) 

Now, as the sample increases, the observed n,;/n converge in probability to their 
theoretical values f; (0.) = 1/m. Since K is continuous in the z’s, Kp — K), will converge 
to zero in probability and, from (17.49), so will Kp — Kz. 

Now we show that this entails that each of 


| fs (to) — Fj (80) | 
converges to zero in probability. In fact, since | f; (80) — 75 does so, it will be enough 


to prove that the same holds for 
ne 50) 


, n 
fy (6) — 71). 


Let K, be the maximum of K for some fixed z,, Then Kp > K, and 
Kp—-Ky>K, — Ky. 


Hence AK, — Ky, converges to zero. The maximum K, is readily seen to be given by 


es n,; (1 = eu) 


2 : 6 A (eal 
: Wes J ae Ce 


K, = n, log 2, + (n — m,) {log (1 — 2,) — log (n — n,)} + >» n;logn;. (17.52) 
j=2 
Now 2, is a double-valued function of A,, continuous and having its two values equal 


for K, = Kz; for K, is continuous in z, from 0 to | (not inclusive), and = changes sign 
1 


only for z; = 7,/n, where K, = Kp. It follows that when K, — K, is small, so is 
2, — ,/n. If the other z’s are not given by (17.51) Kz — K is smaller still. 


A similar argument applies for any j, and hence | z; — — | converges to zero in proba- 
n 


bility when Ap, — K does so. Taking z; = f; (t,) and remembering that in this case K 
becomes Kz, we reach (17.50). 

Finally, by hypotheses (a) and (b) at least some of the f; (0) have continuous inverse 
functions expressing 6 in terms of the functions f, and hence by taking 


| fs (to) — fy (80) | 


MAXIMUM LIKELIHOOD 15 


as small as we please, we may make t, — 0, as small as we please. Consequently ¢’ con- 
verges to 6, in probability and is consistent. 


17.24. The reader may find the foregoing proof easier to follow if we express its 
main points in geometrical terminology. 
Consider the m proportions n,;/n as the co-ordinates of a point in a space of m 
dimensions. The theoretical frequencies 
J; (90) = 1/m define a point, say M, in 
this space, and the sample point R, cor- 
responding to an observed set of n,’s, may 
be regarded as varying round the “ theo- 
retical” point M. The quantities z are 
the co-ordinates of any point in the hyper- 
plane 2 (z) = 1, which contains M and R. 
(See Fig. 17.1.) 
Now, for any sample point R the 
maximum likelihood estimator t’ assumes 
a value ¢, which in general differs from 
6). This value defines m quantities f; (f,) 
which determine a point Z. This also Fig. 171. 
lies in the hyperplane since the sum of 
the frequencies is unity. Thus the points & determine a set of points Z which all lie on 
the curve defined for variations in 6 by 


El one. 08S) 


Since 6 = 6, is a possible value of 6, the point M/ lies on this curve; R in general does 
not. 

What we have shown in analytical form is that the function K, which is the logarithm 
of a likelihood function defined for any point on the hyperplane, has a maximum at R 
and a maximum on the curve itself at Z. As the sample size increases, R is as near as 
we like to M (in the sense of convergence in probability, that is to say, that as high a pro- 
portion of points # as we like are as near as we like to M). This involves that Z also is as 
near as we like to M. This in turn involves that the parameter-value ¢, corresponding to 
Z is as close as we like to 9, for as high a proportion of the possible points Z as we like, 
which is our theorem. 


17.25. We now prove a second fundamental property of maximum likelihood 
estimators, namely that they tend to normality for large n. More precisely, 

(a) If condition (a) at the beginning of 17.23 is satisfied ; and if (more stringently 

than condition (6) of that section) (c) in a 6-interval containing the true value 6,, 

e is continuous in 6 for every x, x? us approaches a continuous function of 6 as x 


tends to infinity, and a does not vanish in some interval, 


then the maximum likelihood estimator ¢ tends to normality for large x. The condition 


‘ e : ; . 
as to od ensures that in the transformation to finite range = remains continuous in 0 


throughout that range. 


16 ESTIMATION: LIKELIHOOD 
We recall that if 


nN; I ; 
= =, F : 2 ; 2 « 47.54 
ie (17.54) 


that is, if the és are the deviations of the actual proportional frequencies n,/n from the 
“expected ” frequencies 1/m, the distribution of the &’s in the limit will be normal and their 
distribution spherically symmetric. Consider again the orthogonal space of the previous 
section. The sample points are distributed about the point M in a symmetrical form which 
tends to normality. If we choose a set of orthogonal axes in the hyperplane, the projection 
of the sample points on any axis is in the limit distributed normally with variance 1/mn. 

In the neighbourhood of M the curve (17.53) approaches its tangent line as » becomes 
larger, and we therefore have, if s is the distance along the tangent from M, 


m 
a 2 
5? = (0 — 6)? a {ih (0) we , ; . (17.55) 
as follows from (17.53). (The tangent exists in virtue of our hypothesis as to the differential 
coefficients of f in 6.) 

Now consider the point Z on the curve corresponding to the sample point R. We 

know that at Z the function 
K =En, log (s + =) rrr 0 55) 

m 

where we now measure z from JM, is a maximum for variations in z such that Z lies on 
the curve. & is determined by finding the hypersurface (17.56) tangent to the hyper- 
plane 2 (z;) = 0, for at that point dA /dz; is zero. We know that the co-ordinates of 
this point are z; = n,;/n — 1/m and that R is the point of tangency. Ap as defined in 
17.23 is the value of A at R,and Kz is that at Z. We then have, by Taylor's theorem, 


ok ak 
== ae — )}) 6z, +4 GaOr, «3 > (17am 
Z LR ate Zs S " j Ik 2 2 ie = } k ( q 57) 
to the second order of small quantities in 6z. From (17.56) we see that 
ge = ° . ‘ S = - (Vis) 
oz; 
ir 
250% ‘ ee 2 
Me ae ¢ 


Hence 
ae ‘)2 
Roe han Se = Be Cy - wwe (17,60) 
j 
Now 2 (6z;) = 0, for the variation takes place in the hyperplane. Hence, for given R, 


Z is the point for which 2 is is a minimum. As n tends to infinity the n,’s tend to 


di 
equality, and hence Z is the point on the curve which is nearest to R. Thus £ is, in the 
limit, projected orthogonally on to the curve, that is to say, in the limit, on the tangent 
line. 
Now we know that these points are distributed normally with variance 1/mn and 


MAXIMUM LIKELIHOOD ty 
this proves the theorem. We may also evaluate the variance of the maximum likelihood 
estimator; for 


vars 
var t! = = Woes 
z 


3 2 
4, (a 
I 
ee. | Rr) 
mn B15 4 (0) 


and since ¢’ approaches ¢ for fine grouping we have also, remembering that 1/m = f, (6,), 


d =f" of \t da 
Vel eco) of 


anf" (ape) Fae, ree 


where 6 is to be put equal to 4, on the right. 
It may be remarked that condition (c) at the beginning of the section prevents the 


vanishing of ae which might render the expression (17.61) nugatory. 


17.26. We have, then, under the afore-mentioned conditions, 


1 | 0 log f\? 
wari = 7B ( 00 iF 


af 


If the range is independent of 6, or if f and a6 vanish at any extremity of the range which 


depends on 6, we have the alternative form— 
1 alk 
= nH € set) ° e ° ° ° (i263) 


vart 062 


b 

In fact, since j f dx =1 where a, b are the limits of the range and may contain 6, we 
a 

have * 


_a>,,  (%,dlogf ab da 
0-5 | fae =| fap de +£6.05 f (a, 6) = 


— (°¢(2losf 
= i, jf ( 56 ) dx. 
Differentiating again, we have 


Bee (2 log/\? b (a2 log f (128 3 -( dlogf \ aa 
o=|'( 00 )rae + [( ~00? ) fae + f 00 \F f a6 \ a - (17.64) 


pee a) : : 
Again, if the range is independent of 0 or if (35) vanishes at the extremity, the last two 


* The operation of differentiating under the integral sign requires certain conditions as to uniform 
convergence, even when the limits are independent of 6. To avoid prolixity we shall always assume 
that the conditions hold unless the contrary is stated. The point gives rise to no statistical difficulty 
Dut is troublesome when one is aiming at complete mathematical rigour. 


A.S.—I 0 


18 ESTIMATION: LIKELIHOOD 


terms on the right in (17.64) are zero, and we have (reverting to our usual convention as 


to limits) = ar(logf)p,  _ [% (@logf\ 
\. cee | ) Jae 


and the result follows from (17.62). 


17.27. We now prove a third fundamental property concerning the efficiency of 
maximum likelihood estimates. 

If t be any estimator of 6, the range of f (2, 6) is independent of 9, and in large samples 
t is distributed normally about mean 6, (the true value of 6) with variance v; then 


1 ” fdlog f\2 : 
= d ——— — . 
ae cannot excee _ ( 56 ) fdz, with @ —6,; 
and hence, if a maximum likelihood estimator exists, it is most-efficient in the class of 
such estimators. 
By hypothesis, we have in the limit for the frequency function of ft, 


_ 1 (t — 0)? 
a WV (220) exp = Ww } ° ° 5 A (17.65) 
and hence 
0? log ® 1 
saa = = : 4 a - (17.66) 


where, for convenience, we drop the suffix of @ until the end of the proof. We then have 


1 [? a log® 
a ee 


" 1 /ab\2 
. a(F) daw ew, 


a 
Dae. (log LZ) . ; : : ; « (17.63) 


Now consider 


as a random variable over the possible values x, . . . x, conditioned by ¢ = constant. 
Since the frequency of wu is L, we have 

d (Lu?) {2X (Lu)}? 

TW) { (L)}2 . ; : . (17.69) 
with summation (or integration) over the range of 2’s. Now @ is the frequency of all 
samples having a constant f, and hence 


@ = 5 (L). 


var u = 


Hence 
2 (Ee) eee 


var u = — 
DP Pp 


= B Ef (a) = = 1=(35) - . . Cae 


Now var wu cannot be negative and © is not negative, and hence 


=17 (45) | ~512(%) po. PS 6 


MAXIMUM LIKELIHOOD 119 


ol re) aD 
Bu =(%) = 5 (EL) =<, 


and hence, substituting in (17.71) and integrating over all t, we have 


1 /aL\? 1/ab\? 1 
Se ae i |) eet TD 


Now is carried out over all x for constant ¢ and the integration over all t, so that the two 
summations together are equivalent to summation over the z’s without restriction. Hence 


Ih . cae Bay acl FN 
- < a noe — | — 7 Corer: 
he i ZL (=) ation daa 
5 dlog L\2 
<|_. ? .[ 2( SR) Ag sai OT, 


= anys l : 
<n i ( set) ths 
which establishes the result, since the expression on the right is the reciprocal of the variance 
of the maximum likelihood estimator, if it exists. 


17.28. The fourth fundamental theorem of maximum likelihood estimators is as 
follows :— 
If a sufficient estimator exists, it is a function of the maximum likelihood estimator. 
In fact, the likelihood can then be put in the form 
owe, 0) Lae 2.) 
where L, does not contain 9. Hence 
r] 


7] 
ap log & = — lorw, 


06 
= y (6, t), a function of 6 and ¢ only. ; (L773) 
Hence, for fixed f, a L is constant, and it follows from the previous section that the 
variance of ¢ is equal to the variance of a most-efficient estimator (for var wu is then zero 
for fixed ¢ and the inequality (17.72) becomes an equality). Hence the sufficient estimator 


is most-efficient, confirming the result of 17.18. 
It follows from (17.73) that the maximum likelihood estimator is given by 


y(0,t)=0,. . «© «. «.  . (174) 


which proves the theorem. 
Conversely, if t is such that (17.73) is true, it must be sufficient ; for then we have 


bel = C +{v@, t) dO, 
where C does not depend on @ and the likelihood is of the requisite form. 


Example 17.7 
Consider the estimation of the parameter m in the population 


1 1 /x — m\? 
eae meee) | da — Ome <0 « 
ak ap | 3( - y x co <4 <0 


20 ESTIMATION: LIKELIHOOD 


where o is known. The frequency function is easily seen to obey the conditions relating 
to maximum likelihood estimators. We have , 


i n 
log L = — nlog o+/(22) — — (x; — m)?*, 
2c? a 


and hence the maximum likelihood estimator is the root of 


giving m=- X (x) = &. 


It is frequently convenient to denote the estimator of a parameter by writing a cir- 
cumflex accent over it in this way. 

In this case the sample mean is the maximum likelihood estimator. It is therefore 
most-efficient and no other estimator can have a smaller variance in the limit. For the 
variance we have, from (17.63), 


1 ” /a? log !) 
=—n dx 
var m i ( 00? 6=m t 


i | = de 
_ 1 
= 
giving the familiar result— 
2 
varg@ =2, 
n 


This, as it happens, is true for any n. The estimator is also sufficient, for 
1 
~—log L = — (nz —n 
ae pe ee 


=a function of m and & only. 


The condition that o? is known is to be noted. Complications arise when two parameters 
are estimated simultaneously, as we shall see presently. 


Example 17.8 
Consider the estimation of 6 in the Type IIT distribution 


7g SY (3 0<4 <0 


where p is known. 
We have 


log f = (p — lj) loga —5 — log I'(p) — p log 6 
and hence, dropping terms independent of 6, 


log ~52@) — np log 8. 


MAXIMUM LIKELIHOOD 21 


The equation of maximum likelihood is then 


1 np 
ae i 
GE: (x) 9 0, 


The variance is given, by (17.63), as 


where 0 is the true value of the parameter. We could also have obtained this result directly 
(and again it happens to be true for all m). From Example 10.11 (vol. I, p. 244) we have 
for the distribution of #/p = 6, 


from which the first two moments about the origin are 


mp +p, 


1 = 9, = 
real M2 np 


— é 62 
giving Val sg — wa 
We note that the likelihood function may be put in the form 


log L = (p — 1) L' log x — n log I’ (p) — "PT — np log 6, 


from which it is evident that 6 is sufficient. 


Example 17.9 
Consider the estimation of the parameter 7 in the Poisson distribution whose general 


4 fz 
term is e~7 =e 
x 


In this Eaee the likelihood function is discontinuous and we have 


enn Ar) 


The 


a 
Hence ie lone 
oA A 


giving 2 = #, the sample mean. 


22 ESTIMATION: LIKELIHOOD 


For the variance we have 


, a familiar result. 


It is easy to see in this case also that 2 is sufficient. 


Example 17.10 


What is the most general form of distribution, differentiable in 6, for which the sample- 
mean is the maximum likelihood estimator ? 
We are given that a solution of 


aa aul 
n 


or 2 (eo — 6) == 0: 


a2y 
eo? 


where K is independent of x but may be dependent on 6, say equal to Then, 


integrating, 
0?y 
06? 


log f = | a0 (« — 6) 


De ee 
=(@—6)a7 ty + f(a), 
where ¢ (x) is an arbitrary function of x. Hence 
0 
fake {e+ y+ cml, 


which is the most general form of /. 
If yp (0) = 307, (x) = — 32? 
the form becomes the normal distribution 
f =kexp {— } (« — 6)?}. 


Successive Approximations to Efficient Estumators 


17.29. In the examples we have just given, the solution of the maximum likelihood 
equation was carried out without difficulty. It frequently happens, however, that the 
equation is by no means so easy to solve explicitly, though it can sometimes be solved 


SUCCESSIVE APPROXIMATIONS TO EFFICIENT ESTIMATORS 23 


for particular values of x by iterative methods. Another possibility is to compute an 
inefficient estimator and correct it by an extra term, which can be obtained as follows :— 
Let ¢’ be- an inefficient estimator and ¢ a most-efficient estimator. Let 


6 =t' —t. 
Then var 6 = var t’ + vart — 2 cov (t,t). s : Pees Werks) 
Remembering that if HZ is the efficiency of ¢’, 


vart = Evart’ 


cov (t, t) 
warrant == 4/0 (see (17.39) ); 
we have 
ae ey : ; ee VET) 


E 


If then ¢’ is “ nearly ”’ efficient, that is, if 1 — E is small, the average value of 6 = t’ —t 
will be small. 
If the maximum likelihood equation is 


OL 
mss = 0 
ae 


consider 
a + vart (238%) . . ° e (lgomap 
oO Jour 
We have 
d log L é log L , 0? log L ‘ 
( — i —_ i: aa ie +( — o( Be), + terms of higher order 
d? log L 
= i = . e e ° e ® e ie 8 
eo (SRS) (17.78) 


For large n, approximately 


1 = /@log L 
ie var t ee 06? 6=t 


and hence, approximately, 


Hence 


t’? =t' + var t (S32) 


ue 


pes 

= t, 
and ¢” is an efficient estimator to a better order of approximation. This process may be 
repeated and, rather like Newton’s successive approximation to the roots of an equation, 
may be expected to improve the efficiency of an estimator. 


Example 17.11 
Suppose we have to estimate 0, the parameter in the Cauchy population 
i dx ; 
es —' 00 < x < 0. 
un mz1+(« — 6)? 


24 ESTIMATION: LIKELIHOOD 


We have already seen that the sample-mean is not a satisfactory estimate and tnat for 
large samples the median is consistent and has variance 22/4n- 
The equation of maximum likelihood gives 
dlogL _ | 2 (a — 8) Lao 
06 1 + (# — 6)? ; 
This is a (2n—1)-ic in 9 and correspondingly difficult to solve. We may, however, 
find the variance of the solution 9 from (17.63). We have 
° @ log f 1(” 2(% — 6)? —2 
I. age 1 =|. eee = pan 
=={ (x? — 1) da 


%Joq (1 + x?)8 
i ] 
= 
Hence 
* 2 
Ver ue 
n 
8 
The median, therefore, has an efficiency of rf 0-8, and we expect that 
-/dlog L 
w” = t’ 6 
t + var ( 36 a 


~~ 


3 4 oa 
n 1+ (a —?t)? 


where t’ denotes the median, will be an improved estimator. 


Most General Form of Distributions possessing Sufficient Estimators 
17.30. If ¢ is sufficient for 6 we have 
0 log L 
06 


where K is some function of tf and 0. Regarding this as an equation in ¢ we see that it 
remains true for any particular value of 0, say zero. It is then evident that ¢ must be 


expressible in the form 
tam 5 r), Pe eer 6 (5) 
= 


where M and k are arbitrary functions. If w = Xi (x) then K is a function of 6 and 
w only, say JN (f, w). We have then 

elog LL aN dw 

00 Ox, dw dx, ; : , 


Now the left-hand side is a function of 6 and x; only and w is a function of x; only. Hence 


=K(i,¢), . «) . 9} Sete 


. (17.81) 


ON . , : ; 
5, 8 a function of 6 and 2, only. But it must be symmetrical in the 2’s and hence is a 


function of 6 only. Hence, integrating with respect to w, we have 


DISTRIBUTION OF SUFFICIENT ESTIMATORS 25 


where p and q are arbitrary oe of 6. Thus 


FUEL) = FE flog sf (ey 6) } =p OER) +90) . — . (17.82) 
pence a, log f (2,6) = p (6) k (x) + : ~ 4 (6) 
giving f (@, ‘i =exp {p(0)k(z)+q(@)+r(@z)}, .  . - (17.83) 


where we still write p and q for the aoe functions. 
The expression may also be written 


f (z, 0) = Q (0) RB (x) exp {p (6) & (x)} : : . (17.84) 
or, if we simplify the specification of the distribution by writing 6 instead of p (6), 
ie) Oye aexp {0k (x). ; : . (17.85) 


It will be found that if (17.85) holds, the likelihood function is of the required form for 
the existence of a sufficient estimator, so that the equation is sufficient as well as necessary. 


Distribution of Sufficient Estimators 


17.31. It is remarkable that the distribution of a sufficient estimator can be obtained 
directly from the likelihood function. From (17.85) we have 


log L = nlogQ + L log R(x) + 02k (zx) 
giving, for the maximum likelihood estimator, 


5 oy tT h(a) =0. beg a. dG rase) 


Now, for the characteristic function ¢ («) of w(= Xk (x)) we have— 


db (a) =) ae = eu (eau: +f (tn, 0) aa, 


= {f Bae 8 ax\" 


a if. Q (0) R (x) eft Ha) de} 


=f. mt } 17.87 
{ TOL ia) : : : : : : : « CERT 
Hence the frequency function of w, if existent, is 
1 — Ww Q (9) : 
f= s\_“loerm} 


Now from (17.86), 
Q 06 oat 


=ns8 (é), Say, 


and hence the frequency function of the estimator ¢ is 


fH=— (F) i‘. e-txn(0 { 4 ih dee. » . (17.88) 


26 ESTIMATION: LIKELIHOOD 
Example 17.12 


The normal distribution with unit variance may be put in the form 


1 
= —_— ¢  e- e, 
V (22) 

Comparing this with (17.85), we see that if 

Q (6) =e 

1 
eee) Cute 
©) = 7a) 
ee)! = 


the condition for a sufficient estimator is satisfied. That this is (as we already know) 
the mean « may be confirmed from (17.88). We have 

0 
——] ae 
and hence for the frequency function of the estimator 2, 


soa) *) 
ee e—tanz ad a da 
an jee e-hetia) 


= =f. exp {— dna? — ian(é —6)} dx 


S (6) = 


= Jz exp { — $n(z—0)}. 


Example 17.13 


The Type III distribution considered in Example 17.8 may be put in the slightly 
different form 


p 
dF = 9-1 e-vz dx, Ona tes, Cor 
I(p) 


Regarding » as known and considering y as the parameter under estimate, we see that 
a sufficient estimator exists, because we may write 


ae 
Faye 
Ue) 


which throws the distribution into the form (17.85). We have found the estimator and 
its distribution in Example 17.8. 
On the other hand, suppose that y is known and we wish to estimate p. Writing 


a 
Q (p) = EG 
R (a) = e~vz—log x 
k eros 


we see that a sufficient estimator for p also exists. It is the solution of 


— 5 log I'(p) + log 7 +5, ¥ log = = 0, 


SUFFICIENT ESTIMATORS WHEN RANGE DEPENDS ON PARAMETER 27 


which does not permit of expression of p as a simple function of the z’s. The sampling 
distribution is not expressible in a simple form. 


Example 17.14 

Consider again the Cauchy distribution 
aaa) dx 
Se eae 6) 


Evidently this cannot be thrown into the form (17.85) and hence no sufficient estimator 
exists. We have already found (Example 17.11) that there is an efficient estimator. For 
finite n no single estimator can contain all that the sample can tell us about 6. 


dF 


Sufficient Estimators when the Range depends on the Parameter 


17.32. One of the conditions of the theorem of 17.23 and that of 17.27 is that the 
range should be independent of 9. In the contrary case our results, particularly for sufficient 
estimators, require reconsideration. 

Suppose the range of the frequency function is from 6 to b, where 0 is fixed. If there 
is a sufficient estimator for 6, say t, the distribution of ¢ and any other estimator is inde- 
pendent of 6. Take 2,, the lowest value of the sample, as such other estimator. Then 
if ¢ is fixed the distribution of x, is independent of 6, which is clearly impossible unless in 
fixing t we also fix x,, that is to say, ¢ is a function of z,. Thus if a sufficient estimator 
exists it must be a function of 2. 

Similarly if the range is from a to 0, a sufficient estimator for 6 must be a function 
of the largest sample member. 


17.33. Ifzx, or some function of it is sufficient for 0, the lower extremity of the range, 
and x, is fixed, the probability that any particular sample value x is greater than 2, is 
proportional to f(x, 6). This must be independent of 6, since 2, is sufficient, and hence 
so is f (x, 6)/f (a, 6). Thus 

) i] = g (x) . e. . ° ° Weis 
and this is the most general form admitting a sufficient estimator. 

It remains true in such circumstances that the smallest member of the sample is 
a maximum likelihood estimator. For the likelihood is 


es) ee 7) 
{ h (0) Me 
which is clearly a maximum when h/ (6) is a minimum. Now since the total frequency is 
unity we have, from (17.89), 


RAO) = [9 (a) da. A ° . . . (17.90) 


6 cannot be greater than x,, for then such a sample value could not appear. The value 
which minimises h (6) is seen from (17.90) to be that which minimises the range, i.e. 2,. 


17.34. When both extremes of the range, a and 6, depend on 6, some further modi- 
fication is necessary. Suppose that a is equal to @ and that 6 (@) is some strictly decreasing 


28 ESTIMATION: LIKELIHOOD 


function of 6. Let X,, be the value such that 6 (X,) =~z,, the greatest member of the 
sample, and let ¢ be the smaller of 7, and X,. Then of the inequalities 
t<%, Dt) > aaee : : : Fey iso) ) 
one at least is true. But the first equality implies that ¢>6 and the second that 
b (t) <6 (6), and either of these two implies the other. Hence both inequalities in (17.91) 
are true, and 
0 <t <2, <2, <b0) <b (6). . 9.2 ee eee 
Samples with fixed ¢ then lie in a fixed range, and hence ¢ is sufficient if the frequency 
function is of the form (17.89). It would seem that this remains the most general form of 
frequency function admitting a sufficient estimator when both extremes of the range 
depend on @. 


Example 17.15 
Consider the rectangular distribution 


dx 
= — Oi < 
d. 34 6<x2 <0 
If we take the ordinary likelihood equation we get 
7] 7] n 
59 los & =— 3g 7 108 (26) = — A 


For this to vanish 6 must tend to infinity, an obviously nugatory result. In accordance 
with the above discussion we should take as our estimate of 6 the smaller of x, and — 2,, 
and this is obviously sufficient, for nothing in the sample can tell us more about the 
terminals of the range than its most extreme members. 


Intrinsic Accuracy 
17.35. If the sampling distribution of an estimator ¢ is 


dF = @ (t, 6) dt , : 2 ; . (17.93) 
we define the accuracy of ¢ as 
ee sal 
l’= aes a 
| a ( =| eo 
d log @\? 
= E = ——— ° e ° e © cy 
( a ) (17.94) 


It is evidently essentially a positive quantity. We assume, unless the contrary is stated, 
that the range is independent of 6. 

I’ is the quantity we have already encountered in (17.67) as the reciprocal of the 
variance of t when it tends to normality in large samples. As in 17.27, we have 


(ee) 2 
I <n| (“sp )rde ee | ee 


<nI, say, where 


ral (SRY tae es 


Now J is independent of the estimator ¢ and we may call it the intrinsic accuracy of 
the distribution f in regard to 6. It is intrinsic because it depends only on f. It may 


INTRINSIC ACCURACY 29 


be termed accuracy because it provides, for large samples at least, a minimum to the 
variance of possible estimators of 6. We know from 17.25 that under certain conditions 
the maximum likelihood estimator attains this minimum for large samples. 


17.36. We may now extend the definition of efficiency of an estimator to the case 
of small samples. In fact, the efficiency is the ratio of the accuracy of an estimator to the 
intrinsic accuracy of the distribution for the parameter under estimate. This is easily 
seen to apply to the case of large samples for which efficiency was defined in 17.12, and 
may be applied to finite samples or non-normal sampling variation. For such cases, 
however, it is conceivable that the efficiency might exceed unity. A proof that this is not 
so when the range is independent of 0 is suggested in Exercise 17.12. 


17.37. If the range is independent of 0 we have 
E ( st) es ao | fax 90 


00 30 
and hence the following three expressions for the intrinsic accuracy are equivalent : 
d log f \? 
#( =H") 
d* log f 
bay ~ © « EE sereipeer 
( ptt) (17.97) 


This equivalence holds if f is zero at the extremes of the range. For we then have 
_ af da gy 0b 
0-5 | far =| Ade fa, ) F +h 005 
= (3 of ax. 


But if f is not zero at the extremes the equivalence may break down. (Cf. Exercises 17.9 
and 17.11.) 


Amount of Information 


17.38. The quantity nI has been called the amount of information about 6 in the 
sample of n, and J may be called the amount of information per member of the sample. 
The use of “information ”’ in this specialised sense has not been universally accepted, 
but some of the properties of J are such as we should require of any measure of information. 


(a) If the parent does not contain 9, J = 0 so that no sample can tell us anything 
about 0, which must obviously be so. 

(6) Since sufficient estimators contain all the relevant information in the sample 
we expect their accuracy to be nJ, and conversely. That this is so may be seen as 
in 17.27 and 17.28. In fact, if ¢ is such that the equality in (17.72) holds, var u = 0 
dlog L 

a0 
is then of the type required for sufficiency. 


and for fixed ¢, 


is constant, irrespective of the form of distribution of é. Log Z 


30 ESTIMATION: LIKELIHOOD 


(c) The sum of the amounts of information in two independent sample-members 
is the amount of information in the pair taken together. For if their joint distribution is 


dl = f, (a, 6) dx f. (y, 0) dy, 
we have for the intrinsic accuracy 


07 log fi fo 
= {| BPA as. dx dy 


é? log f;, e71 
— [{ GBP hae dy — {] PRP pf de dy 


0? log f. 0? log f. . 
= =| 302 Scie — | HEA aay, . . . ° (17.98) 


which is the property stated. 


I 


Loss of Accuracy 


17.39. Where no sufficient estimator exists, it follows from (b) of the previous para- 
graph that no estimator for finite m can contain all the information in the sample. In 
so far as any particular estimator falls short of the ideal we may be said to lose information 
by using it. No estimator can avoid losing something, although of course some may 
lose less than others. 

Presumably the loss will be greater for large samples than for small ones, and will 
be least for maximum likelihood estimators. We may calculate the loss in this case. If 
t is the maximum likelihood estimator of 9, we have, to a first approximation, 


dlogL _ 0? log L 


va = (6 —t it 
06 ( ao? a 
2 
The variance of ae in samples for which ¢ is constant is thus the variance of a 
within the set multiplied by (t — 6)*. Now the total loss of information, from 17.27, 
L 
is seen to be var u = var (39 ), and hence is equal to the variance of ¢ multiplied 
: O72 log De aan : : ; 
by the total variance of 302 > within sets for which ¢ is constant. This we now evaluate. 
Suppose the distribution is grouped so that the “‘ expected” frequency in the jth 
group is m;. The likelihood is then proportional to m,""m,". . . and apart from 
constants independent of 4 we have 
logL=2'n,logm, . , : ; : - (17.100) 
9 
Z e oe z— n, where m’ = oe : ; : - (7. 1eR 
0" log L mm” Jen 
= 2< | — — — 4 : es : : 
06? | ( m =) nt ee 


We have at once 


: =-82\(= wa) t= — BE {mr — Tot 
vart mM m* m 


=2(7). So . 4 SS ee 


LOSS OF ACCURACY 31 


We shall find it most convenient to regard the n’s as distributed over the groups first of 


all without restriction and then subject to two linear constraints expressed by 2 (n,;) = n 


01 t 
d wee =z(" n) = constant. From this viewpoint the n’s may be regarded as 


m 
distributed in the Poisson form with mean and variance m (not the binomial because we 
are not introducing the restriction that the samples should be of fixed size, except as a 
constraint). 

Now if & (k; n;) is a linear function of the n’s subject to a linear constraint X («; n;) = p, 
its variance is 
2? (kam) 


2; ie the) Sat’ 


oul &: . (17.104) 


and a second constraint reduces the variance by a term similar to the second in this expres- 
sion. The result may be seen from geometrical considerations. We may write 


i E (kym.- ) and 


f/m 
E (an) =F (a/m.— 
i “a/m J? 
where the variables ea have unit variance and mean 1/m. Consider the different values 
of the n’s, say s in number, as the co-ordinates in a Euclidean space. The density function 
of the variables is then symmetrical about a point (4/m,, W/m, . . . +/m,) to which we 


transfer the origin. The variance of the unconstrained variables is then equal to the 
reciprocal of the distance from the origin to the hyperplane 2 (k1/mz) = 1, namely, to 
2 (k?m). But when the constraint is imposed, the variance becomes proportional to the 
reciprocal of the distance from the origin to the hyperplane in the direction parallel to 
> (a./mx) = 0 and is hence reduced by the amount 
cos’? ¢ »' (k? m), 

where ¢ is the angle between the planes. This quantity is 

d? (kv/m.ar/m) 

& (k? m), 
Si ea Cu ae 


which gives us the second term in (17.104). 


Now for the first linear constraint 2 (n) = constant = n we have « = 1, and the 
reducing term is (since 2 (m) = n also): 
eRe 2 (key: 


t 


. m . 
For the second constraint we have « = a and hence the term is 


_ 2 (km’) 


+ (my 
m 
Thus the variance of 2 (kn) is 
5 (k* m) — = 52 (lim) — 


ee 005) 


32 ESTIMATION: LIKELIHOOD 
Now taking 

m * 

m2 


2 
wee (on 
var ft m 


we see from (17.102) that the loss of information is, for large samples, 


ZT (m an) _1 5(=) ne cy a (17.108) 


By considering the width of the groups as tending to zero we may apply this result 
also to continuous distributions. 


Ww 
m 
(ee 
m 


and remembering that 


Example 17.16 
In the distribution 
Aa 
zl +(x — 6)? 
there is no sufficient estimator, as we have seen. Let us consider the loss of information 
consequent upon using. the maximum likelihood estimator. 
We may write for our “expected” value m 


— © <x < oo 


ee, dx _ 

~~ wl + (a — 6)? 
m'*\ n° 4p%dp _n 
Hence oe (=) = i tp = 
4 


a =*\" (Os) eee 
m m Uw J—w (1 De) 8 


z{" (m" -=)I = 
m m 


The intrinsic accuracy of the original distribution is 4, so the loss of information is equivalent 
to 24 observations for large samples. For small samples it will presumably be smaller, 
since it vanishes for samples of one. The loss by use of the maximum likelihood estimator 
is therefore very slight and becomes of diminishing importance as the size of the sample 
increases. 


Ancillary Estimators 


17.40. Where no sufficient estimator exists no single estimator can avoid the loss 
of information ; but we may take an additional function of the variables which, together 
with the maximum likelihood estimator, will give an accuracy tending to unity in large 
samples. By taking a third function we can improve the accuracy still further, and so 


MULTIVARIATE DISTRIBUTIONS WITH ONE PARAMETER 33 


on. The process is analogous to approximating to the value of a function (the likelihood 
function) by ascertaining its differential coefficients at some particular point of the range. 


In fact, suppose that, in addition to the estimator which gives a for some value 


0" log L 
oo? 
in the neighbourhood of those for which these two are constant is then, to the first 


log L 


of 6 such as ¢, we also find for that value. The variance of = p over values 


approximation, the variance of 


which has ordinarily a mean value and variance of lower order in n. In particular, if ¢ 


2 
is the maximum likelihood estimator, so that € ee =) = 0, the value of (“Se ) 
o=t O=t 


a0 a0? 
may provide supplementary information which enables us to approximate more closely 
to the likelihood function and hence salvage some of the lost information. Such a quantity 
is accordingly called an ancillary estimator. Cf. 17.29 above. 


Multivariate Distributions with One Parameter 


17.41. We now proceed to consider the extension of some of the foregoing results 
in two directions: (a) where there is more than one variate but still only one parameter, 
and (b) where there is more than one parameter to be estimated. 

The former raises no new point of difficulty. To take the bivariate case as an example, 
if the frequency function is f (x, y, 8), the likelihood is 


L == V1, Y15 6) a) ers, dp pas Yn> 6) . ’ . ° (UIT) 


and our maximum likelihood estimator is obtained by maximising ZL in the usual way. 


Example 17.17 
To estimate the parameter p in samples of n from 


1 Reet Fee 
ee ee sag ores =, 2) | dx dy. 
ae ieee | To pry + 9°) | ded 
We find 
I 
log L = constant — 5 log (1 — p*) — 345 {2X (x*#) — 2p D (xy) + & (y?) }, 
whence, for 3 2 = 0 we have 
1 
Le SEE ee ZS (y) } + —— Z (ay) = 0; 
fet at apap! (a*) — 2p 2 (ay) + % (y?) } ear (xy) 


reducing to the cubic in p, 


np Pe hey) — 2) +2 (y?) } = 0. 


p(L — p*) 
It is interesting to note that this does not bo the product- “moment of the ee 
A.S.—II 


34 ESTIMATION: LIKELIHOOD 


We have, after a little reduction, 
2 JI 2 1 83p2 4 
beef AR _ tay tye 


Op? ae Bee ia 
Since F (a?) = # (y?) = 1 and # (xy) = p, we ee “ ste estimator f, 
. ie 1+ pee 4p* 
" nvarp (1—p%)* (1 363)? (1 p3)* 
i (r= p?)? 
h ne — 
whence var p = nm (lk pi)" 


This is less (and may be considerably less) than the variance of the sample product-moment 
in large samples, (1 — p?)?/n. The efficiency of the latter is 1/(1 + p%). 


Simultaneous Estimation of Several Parameters 


17.42. We now turn to the case when the unknown parameters are more than one 
in number. To simplify the exposition we shall consider the case of two parameters 6, 
and @,, but examples not infrequently arise where more than two have to be estimated— 
for instance, in the fitting of certain Pearson curves there are four. To fix the ideas, 
consider the normal distribution ; 


1 1 ‘ 
ae 04 : 
df = 8, »/(2n) exp | oR (x — 6,) \ aa. CO Um ee 
The likelihood function, except for constants, is ae by 
log L = — nlog 0, — — ge — Oye. < : + (17.105) 


It is natural to generalise our principle of aus by looking for estimators which shall 
maximise L for independent simultaneous variations of 6, and @,, i.e. to require that 
dlogL _ 9 Clog L 
00, ; 00, 


—-0o . : . _. (17,109) 


In our case this leads to 
(ee 0,) — 0 


n ] 
Gs ee 
6, + 53 (x 7 6,)? = 0, 
whence for the estimators 6, and 6,, 


S (else 2 «2 oy laa 


S(e@—z) 5 5 eee 


Thus the sample mean and variance are estimates of the population mean and variance. 
We note incidentally that the estimator 6, is biassed. 


17.43. There is one possible source of confusion here which should be removed. 
If we know 61, then 6, is given by 
1 
n 


6 =-— Let) : : : ~ (710s 


which is not the same as (17.111), the sample-mean @ having been replaced by the known 


SIMULTANEOUS ESTIMATION OF SEVERAL PARAMETERS 35 


quantity 6, Suppose then we estimate 6, by #, as we may do whether we know 6, or not, 
since (17.110) does not contain §,. We may then ask, what is the estimator of 6, which 
maximises the likelihood for all samples giving the ascertained value of 6,, namely, < ? 

This is an entirely different question from the one which gave rise to (17.111) and we 
must not be surprised if it has a different answer. The variations of L from sample to 
sample are now considered in a certain sub-population for which @ has a fixed value. 

In our particular case the problem can be solved explicitly. The likelihood function 
can be thrown into the form, with variables and s— 


= i n n 
Ld d. =— — Fe ee, ol i 2 
x ds 6, Js exp { 262 ( — 6,) \ 


nil) 8 n—-2 1 Pa 
‘. ase z = a ee aa 
me) TS a) aa? ( om) aes: (17.113) 


where s? is the sample variance. 


If we maximise the likelihood in this form for simultaneous variations of 0, and 6, 
we arrive back at (17.110) and (17.111), as of course we must. But if # has a fixed value, 
the distribution of s becomes of one lower degree of freedom. The likelihood is then 
proportional to the second factor in (17.118), viz. 

gn-2 ns2 
and for variations of 0, this is maximised by 


a n I 
i n—1 


x (a — #)%. ee ie) 
This, it may be noticed, is an unbiassed estimator. 


17.44. The difference between (17.111) and (17.114) is apt to be confusing, for both 
are, in a sense, maximum likelihood estiinators. The distinction arises from the fact that 
we are considering the variation of L in two different populations, the first over all samples 
of size n, the second over the more restricted samples subject to the further constraint 
\ (x) = constant. The difference when n is large, of course, is quite unimportant, but 
as a theoretical matter the point has some interest. 

Which of the two is employed for practical estimation is a matter of choice. At first 
sight it may strike the reader as objectionable to use (17.114), because ¢ is not known before 
the sample is drawn, and there are obvious dangers in basing an inference on properties 
of the sample which are determined a posteriort. This objection, however, does not lie 
in the present case. We make up our mind beforehand that, whatever may turn out 
to be, we will make an inference in relation to the sub-population of samples determined 
by it. There is, in fact, no posterior determination of the rule of inference. 


17.45. Possibly without realising it, the reader is already accustomed to make an 
inference of this kind in relation to a sample number. We do not usually determine before- 
hand what size the sample must be ; our results (apart from the distinction between small 
and large samples, which is another matter) are true for any n, whatever n may turn out 
to be in practice. In the same way the estimator (17.114) is a maximum likelihood esti- 
mator, whatever < may turn out to be, # being a property of the sample, just as n is. 

The fact remains, of course, that (17.111) and (17.114) give different results. Which 


36 ESTIMATION: LIKELIHOOD 


is the better? The answer depends on what we require of the estimator. JIf*we wish 
to choose 6, and 9, so as to maximise their joint likelihood we choose (17.111). If we wish 
to select them so that the likelihood is maximised for 6, and then, for the observed 7, is 
maximised for 6,, we choose (17.114). 


17.46. It may be shown that, as for the case of one parameter, the likelihood esti- 
mators of several parameters are consistent under very general conditions and tend for 
large 1 to be distributed in the multivariate normal form. We omit the proof of these results, 
which the reader will probably be willing to accept, and proceed to a generalisation of 
the theorem of 17.26. Thus :— 


(a) If the frequency function f (x, 61,62, . . . 8,) is continuous in z, and 


(6) if in a certain interval containing the true values 61, O20, . . . On9, = is 
J 
0 
continuous in 6; for every x, x? 2 approaches a continuous function of 0, for large 
j 
0 nee : 
n, and - does not vanish in some interval, then 
j 


n cov (6;, 6;,) = se P : : ~ (17.115) 


where A is the (Hessian) determinant 


_|(° (@logf a log f | 
: =| (Geral 06, ),_ fa ‘ i - (17,116) 


and A, is the minor of the jth row and kth column. When p = 1 this reduces to the 
case of a single parameter. 


As n tends to infinity the joint distribution of the maximum likelihood estimators 
tends to the form 


f=kexp [- = = On (6; — 0;)(8, — aa). j : . (iii 


The theorem will be established if we show that 


_{ Yelosy é log f 
m= ( 7 ai i, Jn 9° ++ (TRIB) 


for then the values of the variances and covariances of the @’s are as stated in (17.116). 
(Compare 15.12.) 
Make the transformation 


Cn — 2 An; (6; = 8,9) . ° ° e ° (17.119) 
i 


and choose the A’s so that the exponential of (17.117) becomes 


Uh es 
es 


Then Ik = 2A, An: : . : . « (17.120) 


The q’s are independent normal variates with variance 1/n. Hence, from the theorem for 
the case of a single parameter, already proved, we have 


ia (ets ae In 5 : : . (ied 


SIMULTANEOUS ESTIMATION OF SEVERAL PARAMETERS 37 


Further, we have 


~ (alogf a ee 
=a ac — 1), his : : s (17.122 
j_( Fn oN f ( 
for if we put G.= WE (u;, — U;) 
I 
and q = v2 (uy, + uU)) 


the expression becomes one half of 


Ale 


which vanishes since the w’s have the same variance as the q’s. 


Now 
“e) ee (#) dlog f 
= J Be = — JA,, 
( 06; 659 O"n, 06; cn) h y OFp, 
Hence 
: ay oS) { d log f ae! 
el I oe J P= a ed eed | 
‘an ( 20; J,» ( OO, J ex fd =e (z Aj An O79, oq, 1a 
=a aa Anns 


in virtue of (17.121) and (17.122), 
= Gir 
from (17.120). The theorem follows. 


Example 17.18 
Let us estimate the five parameters of the bivariate normal form 


= I _ 1 a—a\? 2p (x — «) (y — B) 
=a P| aoe | O1 ) 0192 


+ (oP) | aeay, —o<r%y<om. 


C2 


It will be found that the partial differential coefficients of log LZ yield, on solution, the 
estimators 


63 = - L(y — 9)? 
n 


so that for simultaneous estimation the sample means, variances and covariances are 


estimates of the corresponding parameters. 
To evaluate the sampling variances and covariances we have to evaluate integrals 


of the type 
dt mat 2 lod fealoag 7 dF 
es! i= a0; 06, ; 


These are easily obtainable, being merely functions of moments of different orders. 


38 ESTIMATION: LIKELIHOOD 


Taking the parameters «, 6, 0, 02, p in that order, we find for the Hessian (17.116) 


1 p 
ee ee 0 0 0) 
op) o, 6, (1 — p*) 
p Bf 0 0 0 
o,¢,(1—p) oi) . 
oo 2 2 
0 0 + ae ae oe 
api == a") G10, (1 — sp") Ci —p*} 
2 AY 
0 0 p a sa p p 
0, 02 (1 — p®) Gg (1 — p?) op") 
0 0 Rela ee ee 
o1() — p) G2 (1 — p?) (is. 


This confirms, what we know already, that the distribution of means is independent of 
variances and covariances. We may consider the 2 x 2 block in the top left-hand corner 
and the 3 x 3 block in the bottom right-hand corner separately. If the determinants 
of these blocks are A, and 4,, we have 
1 
“= Fa =P 
4 

~ of o3 ( — p2)* 
The minors will be found to be given by 


A, = 


. 2. eee 0 0 0 
ok — py (lp?) ofo§ 
4p 4 
ee | ae =) 0 0 0 
aap G—p ae 
0 0 z a 2) 
oy 63 (1 — p*)* ai os (1 —p*)* afog (1 — p48 
A . 2p? 2 2p 
oiol(L—p)* ofok(l—p2)* atoll —p2)8 
0 0 2p 2p 4 
oial(l—p%)® iol (1 —p%)* ofoh(l — pip? 
Hence we find 
o2 o 
var @=-1, varf = —, 
n n 
4 OF 4 03 a — ae 
vard, =—! varé,= —, varp =. 
e ue 2n, ~ >” on P n 


These results are already familiar. We have further— 


A A ix G12 As p ormer) 
COV (61, G2) = : cov (&, ag ae aed 
(6, a) on (% B) 
A 1 —- 2 A — 2 
cov (2, 6.) = PSL\EE Pe) coy (eee 


2n 2n 


Hence the correlation between 6, and 6, is p?, that between & and f is p, and that between 


p and 6, or 6, is 


a 
Jf2 


SUFFICIENT ESTIMATORS FOR SEVERAL PARAMETERS 39 


Example 17.19 
Consider the Type III distribution 


Ape 1 x —a\pe-t io 
x SF SS < id 
ri | ¢ xp | ( 6 ) fen a 
For the likelihood we have 
log L = — nlogo — nlog I'(p) + (p — = —*) =a = *). 
o 
The three partial differential aca give 


ee Pl PH n 
(p les Se 


n 
5P +2 (va) =0 
ioe ae + Zilog (==*) =. 
dp o 


For the Hessian, taking the parameters in the order «, o, p, we have 


eae ta i 
a (p — 2) a 3 (p — 1) 
fa i : 
o* o? o 
l 1 d* log I" (p) 
o (p —1) oO dp? 
] d* log I" (p) 2 1 
ee es eee ee ore Aes 
eae dp® ale . 
From this the sampling variances are found to be 
Z 1 d? log I’ (p) 
= >. - 1 
var & wat? dp? 
| 1 dlogI'(p) _ 1 
vere = 7A oF —2 dp? C= 7H 
Se eee 
P< 7A (p — 2) 0% 


Sufficient Estimators for Several Parameters 


17.47. As a natural generalisation from the case of one parameter we shall say that 
t, . . . t, are jointly sufficient for 6, . . . 9, if, and only if, the likelihood function can 
be expressed as 


(a Cie ee) Uns 0, GC oO 6,) =I, (t, es. e 6 ty 0, 2° e e 0,) is (x1 (re ie 2.) (17.123) 


It evidently does not follow that if 6, . . . 9, are known #, is sufficient for 6,. This will 
be true only if the function LZ, may itself be factorised, e.g.— 
aM (tf, oe 6 bb; 0, a /i8* se 6,) = eran (t,, 6, oe 6,,) lave (ts a to» 6. o ee 6,,). e (17.124) 


If a case occurred in which : 
Dy, = Ly; (t1, 81) Lia (te, 2) « - « Lp (tp Op) - ° « (175125) 


40 ESTIMATION: LIKELIHOOD 


we might say that each ¢ was sufficient for the corresponding 6 or that the set of t’s was 
completely sufficient for the 6’s. Such cases, however, are very -rare. 


Example 17.20 


From (17.113) it is evident that % and s are jointly sufficient for mando. Ifo is known 
@ is sufficient for m, but if m is known s is not sufficient for o. The two are not completely 
sufficient. 


17.48. The properties of sufficient estimators may be proved true, with certain 
modifications, for several parameters, but we shall not take the subject further except 
to quote one result. 

If f (w, 6, . . . 6,) is continuous and not zero over some continuous range of the 6’s, 


Gh oy, ; : 
and a exists, then it is necessary and sufficient for the existence of a set of jointly sufficient 


estimators that 
f = exp {3 Aas qe rb r|, oe | aire at 
where A, and B are arbitrary functions a 6’s and X; and Yofx. (See Koopman, 1936.) 
Example 17.21 
The Type III distribution of Example 17.19 gives us 


log f = — p logo — log I'(p) + (p — 1) log (a — a) — 
If « is regarded as known, this may be put in the form 


fe; —— (06 


Hin > {24 


+ (p — 1) log (x — a) — p logo — log I (p), 


which is of type (17.126) with 
1 
A, a oe =2%—a 
o 
A,=p—l, X, = log (x — a) 
B =—plogo — log I'(p). 
Thus if « is known, there are sufficient estimators for o and p jointly. It will be clear on 
inspection that if « is unknown there are no sufficient estimators, even if o and p are known. 


Parameters of Location and Scale 
17.49. Consider a frequency function expressed in the form 


af =9(*5*\a(7=*) . sae SP) Sear 


The parameter « may be regarded as locating the distribution and f as determining its 
scale. In particular the normal distribution may be put in this form. We may write 


dF = exp ¢ (é) dé = exp ¢ (é) 
= 


iB 


= 7d 
. . (17.128) 


where 


and ¢ (£) = logg (é). 


4] 


PARAMETERS OF LOCATION AND SCALE 


log L = X 4 — nlog &, 


In samples of n we have 
giving for the maximum likelihood estimators 
0 log L he 
oe ees hh ; ; : . (17.129 
- 3 ¢' =0 ( ) 
OloghL_ il 
= = (ef tn P (17,130) 
ag Bh $ 
whence we may solve for @ and f 
For the variances and covariance we find 
n{ eel) -#(S) = a 
0x? Br 
ae | {a 1 - } 
E = ee e° ae 
(“aes ae) 
Nees: 0 log f ii 
= Ei — (¢” 2 — 1) ar 
(ee Dp =— 8 (Aa 
0? log f 1 ” 
(“reap =F {pe +99} 
a ie }- a log f alee) 
= 1 : = Oa op ; 
and the Hessian of (17.116) becomes 
-(£) -*(£2) 
p (17.131) 


sf) -*(5=2) 


-#(S P 
from which the variances and covariance of & and f may be determined in the usual way 
u fe 
Ge) vanished, for 


2 


In (17.131) it would be a great convenience if the quantity — # ( 
By a suitable choice of origin we can, in fact, ensure 


then & and f would be independent 
that this is so. Put 
E ($" &) 
= & ——— c 4 : 5 (ltatlae. 
: H(é") al) 
” ” " (p” E) 
Then Big y= B{ Wr + eat | 
— E (od” + Ed"), 
so that 
Eid Ct) — 0: 
With this origin we have for the variances of the (uncorrelated) variables & and f, 
g= — ee anes 
Varo = nE(e") * ayes : ; « (1ETS3) 
: . (17.134) 


A he 
var p = n {EB (p" 62) — 1} 
The point of location so defined, namely, as that for which & and # are uncorrelated, has 


been called by Fisher the centre of location 


42 ESTIMATION: LIKELIHOOD 


Example 17.22 
For the normal distribution 


il x —a\? 
yet ee —i d: 
Fea) | ( B i : 
we have ; d= — le 
E (¢")=—-1 and EH (g’ €) = 9, 


Hence € = é, and the origin chosen is itself the centre of location. From (17.133) and 
(17.134) we find the familiar results (for large samples) 
Var = yar 2 = Bt 
n 
2 


var B = var & =£, 
n 


with Z and s uncorrelated. 


Example 17.23 
Consider again the Type III distribution 


Paz (*57) ow {-*52ha(45"), a <a Sa o, 


where we assume p known. The condition p > 1 is required to ensure the vanishing of 
the frequency function at the extremity x = «, and p> 2 to ensure the convergence of 
some of the mean values. 


Here 
¢ = constant — & + (p — 1) log &. 
Hence 
” tet =— : 
Big) =8(-2£5*)=-— 
Eleoy)—= i (pel) = 4p 
Thus 


€ = § —(p'— 2). 
The centre of location is distant (p — 2) to the right of the start of the distribution. In 
terms of ¢ we have 
¢@ = constant — € — (p — 2) + (p — 1) log (¢ + p — 2) 
D =a " (oat) 
22) 2 eee 
? 7) ee neers 


E ($") = — 1/(p — 2) 
He €2 = 1) =e 


Hence 
Py ee 
Vaio — B*p 2) 
n 
ah 
var p = on 


EFFICIENCY OF THE METHOD OF MOMENTS 43 


Efficiency of the Method of Moments 


17.50. In previous chapters we have fitted distributions of the Pearson type to 
other distributions by identifying lower moments. We were there mainly concerned with 
the properties of populations only and no question of the reliability of estimates arose. 
If, however, we regard the data as a sample from a population, the question arises whether 
fitting by moments provides the most efficient estimators of the unknown parameters. 
As we shall see presently, in general it does not. 

Consider a parent form dependent on four parameters. If the maximum likelihood 
estimators of these parameters are to be obtained in terms of linear functions of the moments 
(as in the fitting of Pearson curves), we must have 


d log L 
06 


= a + a,2 (x) + a,» (x?) + a, 2» (v8) + a, 2 (x4) o{17.138) 


and consequently 
f (%; 91, 92, Os, 6.) = exp {b, + 6, + b, v? + b, wv + 0, x}, . (17.136) 


where the b’s depend on the 6’s. This is the most general form for which the method of 
moments gives maximum likelihood estimators. The 6’s are, of course, conditioned by 
the fact that the total frequency shall bc unity and the distribution function converge. 

Without loss of generality we may take }, = 0. If, then, the other b’s vanish except 
b, and 6, the distribution is normal and the method of moments is most-efficient. In 
other cases, (17.136) does not yield a Pearson distribution except as an approximation. 
For example, 


d log f = 2b,x + 3b, x? + 4b, 23. 


dx 
If 6; and b, are small this is approximately 
HOES cdg oe eas) 
da 1 ee Oe 
2b, : 


which is one form of the equation defining Pearson distributions (cf. 6.2). Only when 
b, and b, are small compared with b, can we expect the method of moments to give estimates 
of high efficiency. 


17.51. <A detailed discussion of the efficiency of moments in determining the para- 
meters of a Pearson distribution has been given by Fisher (192la). We will here quote 
only one of the results by way of illustration. 

We found in Example 17.19 that the variance for large samples of the maximum 
likelihood estimator p is given by 

2 


ee 2 I 


var p = 
dp?= = p—1°* (p—1) 
or, itp —p — 1, by 


varp = ae Me 17.138 
p ( 


a 9 Glog F(1 ah) aces 
dp? pap? 


44 ESTIMATION: LIKELIHOOD 


Now for large ,* 
dq? a? (all 1 all i 

—— = 2) 4 ] aa! a . aa Oe 
ae 2) 3 08 Bigg (P42) 102 as 12p  360p8 * 1260p5 \ 


We then find 


d? 2 1 Sia) 1 i } 
2 logelar —--—.=.4—.— 24 = ae 
ap. oe p tp pet f 
and hence, approximately, 


var p =" +i ps 5 A . «(U7 1389) 


If we estimate the parameters by equating sample-moments to the appropriate moments 
in terms of parameters, we find 
a+ op =m, 


o*p = Me 
2p0% = Ms 
so that, whatever « and o may be, 
2 
b, = me = = o e e © ° (17.140) 
m pp 


where 6, is the sample value of 8,. Now for estimation by the method of moments (cf. 
OF22), 


var b, =o (46, — 248, + 36 + 98.6. — lope = 350). 


which for the present distribution is readily seen to reduce to 
2 
Pi Se+'ie+s) ; : . Gia 
n p 
Hence, from (17.140) we have for p, estimated by the method of moments, 


var b, 


p4 
var p = 7, var by 


=“ p(p+1)(p +5). 


For large p the efficiency of this estimator is then, from (17.139) with p = 1 + p, 


= ase 
(p + 1)(p + 2)(p + 8) 
which is evidently short of unity in many cases. When p exceeds 38-1 (£8, = 0-102) the 
efficiency is over 80 per cent. For p = 19 (8, = 0-20) it is 65 per cent. For p = 4a more 
a* log (1 + p) 
dp? 


exact calculation based on the tables of the trigamma function shows 


that the efficiency is only 22 per cent. 


* The series for the log I’ function is given in most books on advanced calculus, e.g. J. Edwards, 
Integral Calculus, vol. 2, article 942. 


NOTES AND REFERENCES Ae, 


NOTES AND REFERENCES 


The greater part of this chapter is based on the researches of R. A. Fisher, the main 
papers being those of 1921la, 1925b and 1934a. The idea of maximising likelihood may 
be traced back to Gauss and was considered by Edgeworth, but may be regarded as begin- 
ning to exercise an influence on statistical theory only with the publication of Fisher’s 
first paper in 1912. 

The theorem giving the limiting variances and covariances of maximum likelihood 
estimates was proved (incorrectly) by Karl Pearson and Filon in 1898 before it was realised 
that it applied only to maximum likelihood. The necessary correction was given by Edge- 
worth (1908) and Fisher (1921la@), but rigorous proofs were not available until the work of 
Hotelling (1930) and Doob (1934a and b, 1935, 1936). In the text we have followed 
Hotelling’s treatment. 

The inefficiency of moments in fitting distributions, pointed out by Fisher (1921a), 
has led to some controversy, for which see Koshal (1933, 1935), Myers (1934), Elderton 
and Hansmann (1934), K. Pearson (1936), and Fisher (1937a). The reader who pursues 
this subject so far as to read any one of these papers should read them all. 

For work on sufficient estimators see Koopman (1936) and Pitman (1936, 1937b), who 
independently obtained the general form of distribution admitting such estimators. The 
theorem that sufficient estimators have the property 17.17 is due to Fisher, rigorous proofs 
being provided by Neyman (1935a) and Dugué (1936a). Reference should also be made 
to papers by Bartlett (1936a, 6, 1937c, 19386, 1939a, 1940) on the problem of several para- 
meters and what he calls ‘‘ conditional ”’ statistics, i.e. those similar to s? when # or some 
other function of the sample values is regarded as known. See also Neyman and Pearson 
(19362). 

Among recent papers, that by Pitman (1939a) on parameters of scale and location, 
and that by Welch (1939c) on the distribution of maximum likelihood estimates, are 
noteworthy. 

Geary (1942) has recently proved a remarkable generalisation of the theorem that 
in large samples maximum-likelihood estimators have minimum variance in the case of 
one parameter. In fact, for several parameters the maximum likelihood estimators 
minimise the ‘“‘ generalised variance” as defined in Chapter 28. 


EXERCISES 


17.1. If ¢ is a most-efficient estimator and ?¢’ a less-efficient estimator with efficiency 
E, and if the correlation of t and ¢t’ is p, show by considering the estimator ¢” defined by 


(1+ HE — 2 VE)t’ =(1—p VE)t+ (E—p VE)t 


that p = »/H (for in the contrary case var t” > var 2). 
(Fisher, 19250.) 


17.2. If in n trials of an event with probability p there are x successes, show that 
a maximum likelihood estimator of p is z/n. Find its sampling variance and show that 
it is sufficient. 


17.3. Show that the distribution 
dF =i exp {—|x—6| }dz, —-o<r<o 


46 ESTIMATION: LIKELIHOOD 


has a likelihood function for a sample of n which is a maximum at the median if n is odd 
and between the (n/2)th and (n/2 + 1)th members if n is even. 


17.4. For the distribution of the previous exercise show that for a sample of (2m + 1) 
members the median has an accuracy 
(m + 1) (2m + 1) i (2m) ! 
(m — 1) 92m—1 (m !)2 : 
Hence, as m tends to infinity, the loss of information tends to 4 ./(m/x) — 4. Thus, 
although the median is most-efficient the loss of information in large samples does not 


tend to a constant. 
(Fisher, 1925b.) 


17.5. Show that if a most-efficient estimator A and a less-efficient estimator B tend 
to joint normality for large samples, B — A tends to zero correlation with A. 

Show that the error in B may be regarded as composed (for large samples) of two 
parts which are independent, the error in A and the error in B — A. (The first may be 
regarded as sampling error, necessarily inherent in the problem of estimation, the second 
as error due to the inefficiency of the estimator.) 

(Fisher, 19250.) 


17.6. Show that the distribution of the median in a sample of (2m + 1) observations 
from the population 
1 dx 
~ 1 + (x — 6)” 


_ (2m +1)! (x? 4,\™_—— dx 
dF gra p 1 + (2 — 6)” 
where tand=a—06 and |¢| < 4x. 


Show hence that the accuracy of the median is 


setae i { 2m¢ cos? ¢ + G a é*) sin 29" @ ae s*) a dd 
3m (2m +1), (m + 4)! Gia {ae A ee cs =) } 


2 (m — 1) 2? 2m—1 \a 


on) m+4 3 
4 (m +3) (2) { alll Bi ((), SUE (2n) | 


dF 


is given by 


=4+ 


I m— 1 
where J, (z) is the Bessel function of order m and in particular J, (a) = J ; (20) = 0, 
mee ea 


2n 


Invi = a In = Oe 
(Fisher, 19250.) 


EXERCISES 47 


17.7. Show that the most general continuous distribution for which the maximum 
likelihood estimator of a parameter 9 is the geometric mean of the sample is 


ays 
(2,0) = (5 )'2¥ exp {p(0) +£(@) } 


where y is an arbitrary function of 0, and ¢ of z. Show further that the corresponding 
distribution giving the harmonic mean is 


of (eso) exp E 1058 —y} — 3 +500) | 
(Keynes, J.R.S.S. (1911), 74, 323.) 


17.8. Show that, if m is known, the estimator 


$= {22 —m)e\" 


is sufficient for o in samples of » from 
— 1 a 1 = 2 
P= an ow | 558 \* m)*| de, 
and find its distribution by the method of 17.31. 


17.9. By considering the distribution 
dF =e"* dx, 6<2 <0 


show that the three forms of (17.97) are not necessarily equivalent when the range contains 
the parameter to be estimated. 
(Pitman, 1936.) 


17.10. Show that if the frequency function is continuous and is zero at an extreme 
which is a function of 6, there still exists a maximum to the intrinsic accuracy, defined 


é log f \? 
as B( a0 ). 


(Pitman, 1936.) 


17.11. By considering the distribution 
22x 

20+7 

show that the intrinsic accuracy is 4n?/(20 + 1)?. Show further that the largest member 

of the sample is sufficient for 6 and that its distribution is 

2nx (uz? — 02)n—-1 


(26 + 1) 


i — 06<2 <60+1 


ar —«@ (4) dz = 


Hence show that 
E dloga\? . 4n?(@ + 1)? 4n6? 
a0 7) (26 2)2 ieee y= 
so that the mean value in this case is greater than the intrinsic accuracy. 
(Pitman, 1936.) 


48 ESTIMATION: LIKELIHOOD 


@\ 06 
If every possible sample with frequency ¢ gave a different value of ¢ the accuracy would 


be # {3 (3) | and would be independent of ¢ Show that the difference in accuracy 


3\ 30 
may be expressed as , Je 
1a 
B1s( 35 — 3m) | 
and hence is not negative. 
Hence show that the efficiency as defined in 17.36 cannot exceed unity, at least if the 
range is independent of 0. 


é : ere . ; 1 /a@\? 
17.12. Ifthe frequency function of an estimator ¢ is ® its accuracy is H { = ( ) \. 


(Fisher, 19250.) 


17.13. Show that 
I 6, dx 
Lf = _ > — oO <%= 
62 + (aw — 6,)? 
does not admit of a sufficient estimator for either parameter if the other is known, or 


a pair of jointly sufficient estimators if both are unknown. 
(Koopman, 1936.) 


17.14. Show that if a distribution admits a sufficient estimator for either of two 
parameters when the other is known, it admits of a pair of jointly sufficient estimators 


when both parameters are unknown. 
(Koopman, 1936.) 


17.15. Show that the centre of location of the Type IV distribution 


ee 
dF o> aaa Ap Ait (F B *) \ 2 da, —o <% <0 


er y 
where » aud p are assumed known, is distant — B 
p 


4 to the left of the mode of the distribution. 
(Fisher, 1921a.) 


17.16. For the distribution 
du 6 
dF = — ed 
oe I, 5 


show that, in large samples, the mean tends to the form 


I 6n 6nx2 
dF = — me Ge: agai We 
if = exp ( 2 ) az 


Show further that the distribution of the centre of the sample, say c (the mean of the two 
extreme values), tends to 


var c 
gine an Nn 


2n 
dF = exp = : 
7 xP | 7 let | ae 

6 


Hence 


so that the centre is a far better estimator of location than the mean for this distribution. 
(Fisher, 1921a.) 


EXERCISES 49 
17.17. Show that for the Type I distribution 


1 
= ah (1 a? da. 0<2 <1 
B (p,q) o 
the geometric mean of the sample values 2 and that of the values (1 — x) are jointly 
sufficient for the estimation of p and gq. 


17.18. Show that all the Pearson distributions have sufficient estimators for some 
of the parameters if the others are assumed known, and ascertain which are the parameters 
concerned for each type. 


17.19. For the distribution of Exercise 17.15 show that the intrinsic accuracy for « is 
Dips lipase 2)4p a. 4) 
(a 
and that the efficiency of the method of moments in locating the curve is 
p? (op =D lee") 
(perl) (pe eeiae (et 9%) 


(Fisher, 1921a.) 


AGS Tl ; u 


CHAPTER 18 
ESTIMATION: MISCELLANEOUS METHODS 


Minimum Variance 

18.1. We have seen in the previous chapter that under certain general conditions 
the maximum likelihood estimator is most-efficient for large samples, and that for finite 
samples it leads to sufficient estimators where such exist. Sufficient estimators themselves 
contain all the information in the sample about the parameter under estimate. What 
we have not shown, however, is that maximum likelihood estimators have minimum variance 
in finite samples. 

We now consider the subject from a slightly different standpoint. Instead of begin- 
ning with the criteria of efficiency and sufficiency and showing that they lead to certain 
minimal properties, we shall examine the class of estimators which (a) are unbiassed and 
(}) have minimum variance. The minimal property is here taken as the starting-point. 


18.2. Consider, then, a frequency function f(x, 6), and as usual let us write 


tear, 0)... fen; 0). “Then, writing | ax for the n-fold integral over the range 
of the x’s, we have to find ¢ =¢(z,,...4,) such that 

| thde=0. . . «. «.  . (18.2) 

| (¢ —6)?Ldx—=—minimum. . : : - (1882) 


The first equation may also be written 
| ¢-OLde=0, . . | eee 
The problem of finding ¢ is one of the familiar problems in the Calculus of Variations. The 


minimal value of (18.2) has to be found subject to the condition (18.1), which is equivalent to 


~ ek 
i = dz = 1, : : , : : 

j Se a 
provided that the range of f is independent of @ or that f vanishes at any extreme which 


depends on @. 
If 24 is an unspecified parameter (which may depend on 6 but not on the 2’s) the 
problem is equivalent to finding an unconditioned minimum of 


[{@— ort — 20S | ae ee: (eee 


a ; ate. 
a1 t-9) L— 2} = 


The solution is * 


* See, for example, J. Edwards, Integral Calculus, vol. 2, article 1504, or A. R. Forsyth, Calculus 


of Variations, article 15. Since the expression to be minimised does not contain 2 the Euler equation 
a 


Vv 
for a stationary value to the integral | V dx reduces to Tha 0. The derivation of (18.7) is not, 
50 


MINIMUM VARIANCE 51 


or (¢-6)L ~ 22% =o. : 5 . E1826) 
We then have 
A ol 
= () ae ee 
= L 06 
é log L 
=O+4 . : : : » (is 
+A oe, (18.7) 
where ¢ is a function of the z’s but not of 0. Thus there exists a ¢ satisfying our conditions 
: dlogL , 
if we can express a9 iB the form 
dlogsL t—80@ 
a ———————g e e ° ° ry 18.8 
oo A ee 


This is a necessary and sufficient condition, except that it gives only stationary values of 
(18.2) which might, for instance, be maxima instead of minima. This is not a point, 
however, which need detain us from the statistical viewpoint, troublesome as it is to the 
mathematician. 


Example 18.1 
To estimate 6 in the normal population 


dF = —— exp {— 55 (@ — 0)" \ ae, —0o<1r<a 
where o is assumed known. 
We have 
log Ln, 
pi ae 
This can be put in the form (18.8) by taking 
2 
Ls and — oe 
n 


and hence # is the required estimator. We note that it has minimum variance for any 
n in the class of unbiassed estimators of 0. 


Example 18.2 
To estimate @ in 
Che —o <z<m 


Glog L _ x —O 
> A tearescee 


This cannot be put in the form (18.8) and the method fails. There is no estimator which 
is unbiassed and has minimum variance. 


We have 


however, without its difficulties, and I think some conditions have been accidentally suppressed in 
the Aitken-Silverstone method. I understand that Dr. Leon Solomon, working with Dr. Aitken, has 
obtained a proof which depends on the fact that L shall be the product of n independent frequency 
functions. But for the war the point would doubtless have been cleared up by now, but at present 


it remains open. 


52 ESTIMATION: MISCELLANEOUS METHODS 


18.3. Integrating (18.8) with respect to 9 we have 
log L = a (0) (¢ — 6) + B (0) + Ey (a), 
ij 


where «, 8, y are arbitrary functions (apart from the fact that the two former depend on 
A). Hence 
log f (x, 6) = A (8) (é — 6) + B (6) + C (2) 

=p (0) t (x) +q(8) +7 (x), say. . : . (18.9) 
Comparing this with (17.83), we see that the method of minimum variance will give a 
solution only if there exists a sufficient estimator. This explains the success of the method 
in Example 18.1 (where Z is sufficient) and its failure in Example 18.2 (where no sufficient 
estimator exists). 


18.4. In the method of maximum likelihood it makes no difference to the final 
result whether we estimate for a parameter 0 or for some other parameter y functionally 
related to 0. For 

, dlogL  dlogL ay 


06 dy a6 
and the two sides of the equation vanish together. In the method of minimum variance, 
however, there is an interesting difference. 
Suppose we wish to estimate @ in 


dP = oa exp (— 35) —-o<4<o 
a/ (276) 20 , 7 : 
We have 
0 log L Topi as) 
a0 36 29? 


and this may be put in the form (18.8) with 


a Gs and Le ats 


n 


If, however, we consider the parallel problem of estimating o in 


df = 


: ex ae dx ae ee 
ENAa Ney oon eee 


we find 


dlogL _ Be 2k 


do o os ’ 


which cannot be put in the form (18.8). We thus reach the peculiar result that the method 
will provide an estimator for o* but not for o. It follows that in general we may have 
to estimate, not 6 itself, but some function of 6, say rt (6). 


18.5. If a minimum-variance estimator exists for some t (#) we must have 


dlogL t—t 
or AG@s 
which is equivalent to 
Ot 
dlogL _ a6 By, 


i aero me . . . - (18.10) 


MINIMUM VARIANCE 53 


We estimate ¢ by putting it equal to t and thus we shall have, for the estimator, 


log L 
Cr a 


This is equivalent to the equation of maximum likelihood. The two are not, however, 
identical. Maximum likelihood is not concerned with the existence of the function A. 
Minimum variance takes the function as fundamental, and when it exists the solution 
(which is the same as the maximum likelihood solution) has minimum variance for all n 
in the class of unbiassed estimators, not merely for large n. 


18.6. Let us suppose that 6 is the parameter (transformed if necessary) for which 
the estimating function is 6 itself. Then we have for the minimum-variance estimator ¢ 


var t =|) (t — 0)? Lda, 


which, on substitution from (18.8), yields 


var t = as (Ee) Lae . oe. (18.12) 


0 2 
--2{ (x) Lae, .. (18.18) 


06? 


if the range is independent of 6 or f vanishes at any extreme dependent on 0. 
Now from (18.8) we find 


Plog hl 4g) 2 (3) mi 


062 00 \ A re 


and hence, substituting in (18.13) and remembering that | (¢ — 0) Ldx = 0, we find 


—-a 


= ih. . ° ° ny ° e . (18.14) 


The variance of the minimum-variance estimator is thus simply the parameter 4. It also 


follows from (18.13) that 
1 © fo log T 
= — ee) 
var t Py ( 002 ) = 


2e=4) u (e), 1 eee 


06? 


so that the result we reached in Chapter 17, as a limiting form for large n, is now seen to 
be exact for finite » under present conditions. 


Example 18.3 
To estimate @ in the Type III form 
1 


Se ae ea! Ox, O <7 CO, =a, 
arGe i ‘ 


where p is assumed known. 


54 ESTIMATION: MISCELLANEOUS METHODS 


We have 
dlogL mp , n& 
06002=~C~*é‘ié‘CS a 
which is of the form (18.8) if 
z v2 
—— and i 
p np 


Thus ¢ is the minimum-variance estimator and has variance - for finite n, even though 


the distribution is not normal. (Compare Example 17.8.) 


18.7. We may readily determine what function t (8) should be taken as the estimating 


function. Taking the general form from (18.9), 


log f (x, 6) = p (0) t (x) + q (6) + 7 (a), 


we have 
log L = pXLt (x) + ng + Lr (z) 
dlog Lap oy 
ts) ees t i 
ot ot ot) ae oT 
op (1 oq 
=n — {| —2 (t) +— . 
n 220 +34) 
Hence, if 
pa ee Oe 
ap ~— 00 a0 ° 
we have 
| 
= Spe 
seh A 
i or dp , 
l/n 
/n Ot 
which is of the required form provided that 
ee Gy 
A ar 
Example 18.4 
Consider again the estimation of o in 
1 Mi 
dF = oan xP (— 553) ar, —o<ac 
Here 
2 
log f = — $ log (2x) — logo — 4—, 
pe 
whence  (G) BC 2 are == — loge. 


fe 
Thus the appropriate value of t, from (18.17), is 
_ _ 99 /%p 


Oa] do 


— a, 


. (18.16) 


(sam 


. C818) 


. (18.19) 


MINIMUM 7? 55 


which is thus determined as our estimating function. For the variance of the estimator 
of t we have 


the estimator itself being = (ee). 


Minimum ? 


18.8. We now turn to consider another principle which has been suggested for pro- 
viding estimators. If the data are grouped into cells with expected frequency typified 
by A, and observed frequency by J,;, then the function 


2 palatal : ; ' . (18.20) 
A; 
i 
Gy) a 
where n = 2 (A) = 2'(h) : : , , eels. 21) 


can, as we saw in Chapter 12, be used as a measure of closeness of fit. The method of 
minimum y? adopts this standpoint (which is, of course, arbitrary in the logical sense) 
and attempts to determine the parameters / such that 7? is a minimum. 

In practice the method is not very easy to apply because of the difficulty of expressing 
the A’s in terms of the parameter under estimate, 0. For some illustrations reference 
may be made to Kirstine Smith (1916). We shall not consider the method at length 
here for two reasons :— 


(a) it may be shown that for large samples the minimum-y? estimator tends to 
the maximum-likelihood estimator ; 

(6) there is a modification of the method, considered below, which is much easier 
to apply. 


18.9. For samples of fixed size n the distribution of the quantities J; is multinomial, 
and we have for the likelihood function 


i. n! L; vi A, L, 
i] 
Thus . 
log Z = constant + 2’; log (7): 5 : ; « (18.23) 
i] 


Now for large samples we may put 
A; = L, —- a; ni, 


where a, is finite and therefore small compared with J,; | a, ni | =1l,;,and 2 (a 0. 


56 ESTIMATION: MISCELLANEOUS METHODS 


Hence, from (18.23), 


‘a. nt 
log L =k + ZUjlog (4 +3") 


l; 
2 
=pory + Ot 
j 
Se\ 
=k — pees + O (n-#). - , » (18.24) 
j 
Now write 
2 Ay — 4)? 
L; 
2 Bn, . = -.  SRRGees 


Then we see that, to order n=?, DL is maximised by minimising y’?. This latter quantity 
is not the same as 7? because the denominator terms are l’s instead of d’s. However, for 
large n the difference is of order n~*, for 
' 1 i 
Ve — 1? = 2 — Ws -;} 


i] 


= 0 (n74). 


Hence, to order n~* the estimates obtained by minimising either y? or ’? will be equivalent 
to maximising LD. 


18.10. The advantage of using y’? instead of y? in practice resides in the fact that 
the denominators in the former are integral. However, if there are any empty cells (i.e. 
those for which 1; = 0) the formula (18.25) requires some modification. 


* h 
In the likelihood function, if 1, = 0, @i = 1 for all A, The substitution 


will give us, for the empty cells, a term in (18.24) equal to — Yajnt = — L1, = M, 
say. Hence we have 
hae 
, 5 J + 2M, ; : : . (18:26) 


j 
where the summation takes place over occupied cells and M is the sum of the theoretical 
frequencies 4 in the empty cells. 


Example 18.5 


As an example (Jeffreys, 1941) we consider a case where the maximum likelihood 
estimator is known, so that a comparison may be made with the result given by 
minimum y’?, 

Col. (2) of the following table shows the frequency of women in the first class of Part II 


MINIMUM ? 57 


of the Mathematical Tripos from 1910 to 1938 inclusive. Assuming that this distribution 
follows the Poisson distribution oa to estimate 6. 


4! 
(1) (2) (3) (4) 
Aj te 
Number of | Frequency 
firsts, 7 L 
v1 6= 1:5 =e Q = il 6 = 1-5 = 2 
0 6 10-7 6-5 3-9 3:7 0-0 0:7 
8 10-7 9-7 eo 0-9 0-4 0-0 
2 11 5-3 7-3 7-9 3-0 1-2 0-9 
3 3 1-8 3:6 5-2 0-5 0-1 1-6 
4 0 0:5 1-4 2-6 — — —_ 
5 1 0-1 0-4 1-0 0-8 0-4 0-0 
over 5 0 0-0 0-1 0-5 2M — 1-0 | 2M — 3-0) | 207 — 6.2 
Torars | 29 9-9 5-1 9-4 


The sample mean (asufficient estimator of 6) is in thiscase 44/29 = 1-52 with a standard 


a7 
error Jz = (0-23. 
n 


To apply minimum 7’? we have to express the theoretical frequencies in terms of 6. 
This results in an unmanageable equation if we then substitute in y’%. Instead we cal- 
culate the minimum by finding y’? for some trial values of 6 (in this case 1, 1-5 and 2) and 
then interpolating. 

The expectations 4 for the three selected values of @ are shown in column (3) of the 
table and the corresponding y’? in column (4). It is found that, writing 6 = 1-5 + 4, 
the values of y’2 may be represented by the quadratic 


x’? = 5-1 — 0-56 + 18-242. 
The minimum of this is given by ¢ = 0-01, and hence our estimate of 6 is 1:51, very close 
to the value of 1-52 given by the maximum likelihood estimator. 


18.11. On theoretical grounds there seems no reason to use minimum 7? instead of 
maximum likelihood. The method has some practical value, however, where the maxi- 
mum likelihood equations are difficult to solve. We can usually follow the device of the 
example just given, find 7? or x’ for some trial values of the parameter, and approximate 
to the value which minimises y? or y’*. Whether this is easier than finding the maximum 
likelihood estimate in the same sort of way depends on the circumstances of the case, but 
it may well be so when the frequency function is a tabulated integral, so that expected 
frequencies for specified parameter-values can be readily obtained. 


18.12. In the manner of 17.39 we can estimate the loss of information occasioned 
by the use of minimum 7%. We have, for the minimum of 3’, 
a b= 


0, 
06 A 


58 ESTIMATION: MISCELLANEOUS METHODS 


which reduces to 


12? — 2? OA 
eee = (I) 5 (leet 
Az ao ( ) 
Since ai! tends to the constant value 2 for large samples, this is equivalent to the 


maximum likelihood equation 


l1—Adod 
tae ow I ze 
= A 06. ( 
confirming that maximum likelihood and minimum 7? give the same results in the limit. 


Since 
12 — a2 = 24 (1 — 2) + (1 — A)? 
é log L 


36 from. its mean is 


the deviation of 


12 — 2? OA (lL — a)? 0A 

1 Sf Ly ae 

2 ag es 

the first term vanishing on summation. As in 17.39 we find the variance of this quantity 

log L 
00 


. (18.29) 


is constant. We have 


0 
within samples for which 


var Lk (l — A)? = 22 (k?A?) — 


{2 


wee A 
and on substituting k = 2 73 we find 


“(8 
12( ee r) rn 2 


giving the loss of information. 

As the sample size increases, this quantity remains finite. It is interesting to observe, 
however, that as the number of classes increases it also increases without limit, indicating 
that minimum y? breaks down for fine grouping. 


“ Inverse’ Probability 


18.13. According to Bayes’ theorem (7.24), if h (9) d0 is the prior probability of 0, 
the posterior probability is given by 


P(9| a, ... &) =L (ay, . . . 2_, 8) h (0) dd -.. (esr 


It is then easy to determine the “ most probable ” value of 6 by maximising L h (9) if we 
know A (6). The principles of inference with which we have been concerned up to the 
present do not require the notion of the probability of 0 and, even if they did, would not 
give any guide to the nature of the function hf (0). In fact, to an adherent of the frequency 
theory of probability, the prior probability of 6 requires the distribution of 9 in some form, 
and if 6 is merely an unknown constant it has no distribution (except the trivial one that 
f = 1 when 6 takes its true value and f = 0 elsewhere). The alternative school of thought 
assumes the existence of h (0) as denoting a prior measure of belief, but, in order to find 


LEAST SQUARES 59 


the most probable value of 6, has to make some further assumption as to its values com- 
parable to Bayes’ postulate that for a finite range h is a constant. 

We have already noted that on this assumption the maximisation of L is equivalent 
to finding the value of 6 with the greatest posterior probability. It is also interesting to 
note that, whatever the form of h (9), maximum likelihood tends to give the same estimator 
as the method of maximising posterior probability for large n. In fact, for the maximisation 
of P in (18.31) we have 


dlogP  dlogL , dlogh _ 
06 06 a6 
: : dlogL , sn 
In ordinary cases the variance of a5 ~ 38 of order n, whereas the second term is inde- 
pendent of n. In the limit, therefore, the second term is negligible and we are reduced to 
the likelihood equation 


Os «3 see) 


AlogL _ 


0. 
06 


Least Squares 


18.14. The method of least squares bears an analogy to minimum y?. Suppose 
we have an expression depending on a number of unknown parameters 6, ... 6, and 
certain observed values x. This can be thrown into a form such as 


k («,0,...6,) =9, ———— ee 


where k is a given function (not a frequency function). If we have n values of x andn > p 
it is not possible to solve the nm resulting equations of type (18.33) for the 6’s. We then 


consider the “residuals” k (x;, 6, . . . 6,), and the principle of least squares states that 
the values of 6, ... 6, are to be chosen so that 
2 {k (v;, 6, ... 6,) }* = minimum, : f . (18.34) 
or, in other words, so as to satisfy the » equations 
Er let 0, ee 0: ees . (18.35) 


18.15. Consider the case when the residuals are aii distributed normally with variance 
o%. The logarithm of the likelihood is then (except for constants)— 


log L = —nloga — 5, Eh* (x, Oi... Op) .  «  « (18.38) 


and this is clearly maximised by minimising the sum (18.34). In this case, then, the method 
of least squares is equivalent to the method of maximum likelihood. In other cases it 
may give different results, and the justification for using it then becomes more or less 


empirical. 


18.16. The most important case occurring in statistical theory of the use of the 
method of least squares concerns regression equations. We have already seen that the 
coefficients of regression are, in effect, determined so as to minimise the sum of squares of 
residuals (ef. 15.2). We also know that, for the multiple normai distribution, residuals 
from the population regression lines are, in fact, normally distributed (15.13). For normal 


60 ESTIMATION: MISCELLANEOUS METHODS 


variation, therefore, the method of least squares is equivalent to maximum likelihood so 
far as concerns the simultaneous estimation of regression coefficients. 


18.17. This is a convenient point to prove a theorem (due to Gauss) which in one 
form or another is constantly occurring in statistical theory, particularly in connection 
with the normal distribution. Suppose we have a population (not necessarily normal) 
in which the regression of one variate y on the others % (=1), 2... , %» is given by 


y = Bo + 81%. Pee eee : : . (lean) 
The a’s may be correlated among themselves and, in the extreme case, functionally related, 
so that this case includes that of curvilinear regression for our present purposes. Suppose 
that we have a sample of n values, where n > p. Denoting by 2 summation over these 
n values, we determine the estimates of the 6’s by minimising the sum of squares, e.g. 


a (y — Bo — pix, ema Cui Joc) — ppt)": 
Suppose that 6, . . . 6, are the solutions of this process. Then our regression formula is 
y—b,—6,%,—... —6,%, =0. . « ° (18.38) 
The observed residuals, obtained by substituting the observed values in this equation, 
are typified by 
(18.39) 


e=y—b,—b,%,... —6,%,, e . e 


‘real”’ residuals are typified by 


= 9 — by Pie ee eG: : : . (18.40) 
We proceed to compare the sampling variances of e and « and to show that 


whereas the 


n 


var ¢ = —______ var e, . . : - (18.41) 
n—p—Il 
provided that the residuals are uncorrelated. 
Let us transform the observed values of the 2’s to new values &,, &,... &,, (efor 
each) such that 
& (a; €,) = 1 jak 
0 j =} : . ; . (18.42) 
x (Ey) = 0, 


This involves, for each £, p + 1 equations in m unknowns and is therefore possible in general. 
We then have 


— 2, (e —e) = YE, { (Bo — bo) + (81 — bi) 11 +. “ire (65 — On) aan 


= By — by 
But 2&,e = 2 (by) — LE, {bo +b. 4,4... dy ty} 

= b, — b, = 0. 
Hence By — 6, = —L&e. : : : - (18.43) 
Now — Ye(e—e)=2L fy —b— ... — bd, a} {(B.—H) +... (6, —0,) an 


== (0). 
since the summations give terms the vanishing of which determines the b’s. Hence 


269 — 2, 62 == 28 eye 
j 


NOTES AND REFERENCES—EXERCISES 61 


where S denotes summation over the (p + 1) values of j, 
=SLEeLuaye 
= 8 {2 &, x, «*} + cross-product terms in e, 
= Se? + cross-product terms. 
When we take expectations the cross-product terms vanish since the residuals are uncorre- 
lated. Hence 
KE (2 «?) — #(S 2?) = £ 2X e?, 
or (n — p — 1) vare = nvare, A . » (18.44) 
from which (18.41) follows at once. 
For normal variation we shall consider this result from a slightly different viewpoint 
in Chapter 22. 


NOTES AND REFERENCES 
The approach to minimum-variance estimators through the Calculus of Variations is 
due to Aitken and Silverstone (1942). For minimum y? see K. Smith (1916) and R. A. 
Fisher (1922a, 19255). For the modification 7’? see Jeffreys (1938b, 19396, 1941). 
A method of estimation essentially depending on the median has been proposed for 
use in quality control, but its value is as yet problematical. For an account of the technique 
see Simon (1941). 


EXERCISES 
18.1. From the property that the variance of a minimum-variance estimator is 
equal to 2 show that the most general distribution for which the sample mean is a sufficient 
estimator is 


f (2,6) =e(0, 0) exp | ~5(@— 9)", 


where c is an arbitrary function and o? is the variance of f. 
Hence show that no Pearson curve other than the normal admits the sample-mean_ 


as a sufficient estimator, but that a Gram-Charlier series may do so. 
(Aitken and Silverstone, 1942.) 


18.2. If the function / exists and 


® dé 
a (8) AY 
show that the variance of the estimator ¢ is 
|e | 
Mm o2 
where g is the function of 18.7. (Aitken and Silverstone, 1942.) 


18.3. If a population (p + q)* is regarded as distributed in 5 classes, show that the 


intrinsic accuracy is a Show further that the loss of information through estimating 
Pq 


p from minimum 7? is 
5 Ae 2)" 4 _ 9n3 18n2 g2 — 2n03 4\2 
age 8 PI + Be) “op? @? (p* — 2p2.q + 18p*q pg® + g*)?. 
This is least when p = q and is then equivalent to the loss of 5 observations. 
(Fisher, 19250.) 


CHAPTER 19 
CONFIDENCE INTERVALS 


19.1. In the previous two chapters we have been concerned with methods which 
will provide an estimate of the value of one or more unknown parameters ; and the methods 
gave functions of the sample values—the estimators—which, for any given sample, pro- 
vided a unique estimate. It was of course fully recognised that the estimate might differ 
from the parameter in any particular case, and hence that there was a margin of uncer- 
tainty. The extent of this uncertainty was expressed in terms of the sampling variance 
of the estimator. With the somewhat intuitional approach which has served our purpose 
up to this point, we say that it is probable that 0 lies in the range ¢ + +/ var t, very probable 
that it lies in the range t + 2+/ vart, and so on. In short, what we have done is in effect 
to locate @ in a range and not at a particular point, although we have regarded one point 
in the range, viz. ¢ itself, as having a claim to be considered as the “ best ” estimate of 6. 


19.2. In the present chapter we shall examine the logic of this procedure more 
closely and look at the problem of estimation from a different point of view. We now 
abandon attempts to estimate 6 by a function which, for a specified sample, gives a unique 
number. Instead we shall consider merely the specification of a range in which @ lies. 
We shall not attempt to specify whereabouts in the interval the value of 6 really is; all 
values in the range have an equal claim to be taken as the “true’”’ value. Nor shall we 
assess the probability that 6 lies in the interval in the sense that 6 is regarded as a random 
variable. In fact, in the frequency theory of probability 6 is not a random variable (except 
trivially in that the frequency of 6 is unity when it takes the true value and is zero else- 
where). Nevertheless, probability plays an essential part in the determination of the 
interval and in the degree of confidence we have that it “ covers”’ 6. 


Case of one Unknown Parameter 


19.3. Consider in the first place a population dependent on a single unknown para- 
meter 6 and suppose that we are given a random sample of n values x, . . . x, from the 
population. Let z be a statistic dependent on the z’s and on 6, whose sampling distribution 
is independent of 6. (The examples given below will show that in some cases at least such 
a statistic may be found.) Then, given any probability «, we can find a value z, such that 


ie Pine) eee 


and this is true whatever the value of 6. In the notation of the theory of probability we 
shall then have 


Pile 2 — a : : ; 5 - (190i) 
Now it may happen that the inequality z <z, can be transformed to the form 0 <t, or 
6 > t,, where ¢, is some function depending on the value z, and the z’s but not on 6. For 
instance, if z =  — 6 we shall have 


6 
and hence 6 
If this transformation can be made we then have, from (19.1), 


Ie (8 < ty | 6) = a. e e ° ° ° (19.2) 
62 


CASE OF ONE UNKNOWN PARAMETER 63 


More generally, suppose that we can find a function t,, depending on « and the 2’s 
but not on 6, such that (19.2) is true for all 6. Then we may use this equation in probability 
to make certain statements about 6. 


19.4. Note, in the first place, that we cannot assert that the probability is « that 
6 does not exceed a constant t,. This statement (in the frequency theory of probability) 
can only relate to the variation of 0 in a population of 6’s, and in general we do not know 
that 6 varies at all. If it is merely an unknown constant then the probability that 6 <#, 
is either unity or zero. We do not know which of these values is correct, but we do know 
that one of them is correct. 

We therefore look at the matter in another way. Although 6 is not a random variable, 
f, is and will vary from sample to sample. Consequently, if we assert that 6 <t, in each 
case presented for decision, we shall be right in a proportion « of the cases in the long run. 
The statement that the probability of 6 is less than or equal to some assigned value 
has no meaning except in the trivial sense already mentioned; but the statement that 
a statistic ¢, is greater than or equal to 0 (whatever 0 happens to be) has a definite proba- 
bility « of being correct. If therefore we make it a rule to assert the inequality 6 <t#, 
for any sample values which arise, we have the assurance of being right in a proportion 
«% of the cases “‘on the average” or “in the long run.” 

This idea is basic to the theory of confidence intervals which we proceed to develop, 
and the reader should satisfy himself that he has grasped it. 


19.5. To simplify the exposition we have considered only a single quantity t, and 
the statement that 6 <t,. In practice, however, we usually seek for two quantities ¢, 
and ¢,, such that 

Pina aio} =e, ; , . . (19.3) 


and make the assertion that 6 lies in the range ¢, to t;. These quantities are known as the 
Lower and Upper Confidence Limits respectively. They depend only on « and the sample 
values. For any fixed « the totality of values of f, and ¢, for different samples determine 
a field within which @ is asserted to lie. This field is called the Confidence Belt or Region 
of Acceptance. We shall give a graphical representation of the idea below. The number 
a is called the Confidence Coefficient. 


Example 19.1 
Suppose we have a sample of m from the normal population with unit variance 


1 
en. {— 4 ( — p)*} da, — 0 <4 <o. 
The distribution of means # will be 
al = ” exp —5@— mh as, —0o <i <0. 
27 He 


From the tables of the normal integral we know that the probability of a positive deviation 
from the mean not greater than twice the standard deviation is 0:97725. We have 
then— 
7 _ 
Pia Se aes i = 0-97725, 


64 CONFIDENCE INTERVALS 


which is equivalent to 
2 
fe = 0: : 
Pia wai u| u 0:97725 


Thus, if we assert that ~ is greater than or equal to  — 2 /Vn we shall be right in about 
97-725 per cent. of the cases. 
Similarly we have 
2 


|, ae oe Pee |e 
Be LS Vik —a Be = 0:97725. 
Hence, combining the two results, 


' 2 2 2 = , ee 
Pia— J, <u <i + |uh =2 (007725) 1 = 0-9545. 


Hence, if we assert that u lies in the range + 2//\/n we shall be right in about 95-45 per 


cent. of the cases in the long run. 
Conversely, given the confidence coefficient we can easily find from the tables of the 


normal integral the deviation d such that P {a — “ <p e+ <3 = 4. For instance, 
if « = 0:8, d = 1-28, so that if we assert that x lies in the range & + 1-28/1/n the odd 
are 4 to 1 that we shall be right. : 

The reader to whom this approach is new will probably ask: but is this not a round- 
about way of using the standard error to set limits to an estimate of the mean? In a 
way, it is. In effect, what we have done in this example is to show how the use of the 
standard error of the mean in normal samples may be justified on logical grounds without 
appeal to new principles of inference other than those incorporated in the theory of proba- 
bility itself. In particular we make no use of Bayes’ postulate. 

Another point of interest in this example is that the upper and lower confidence limits 
derived above are equidistant from the mean ¢. This is not by any means necessary, 
and it is easy to see that we can derive any number of alternative limits for the same con- 
fidence coefficient «. Suppose, for instance, we take a = 0-9545, and select two numbers 
% and «,, which obey the condition 


(a, +o, — 1) = 00845; 
say % = 0:9645 and a, = 0-99. From the tables of the normal integral we have 


5 2°326 
ze 1806 
Pie —p = ele} = 0-9645; 


and hence 


a 2°326 _ 1-806 


Thus, with the same confidence coefficient we can assert that u lies in the range @ — 2 /Vn 
to + 2//n, or in the range @ — 2:326/1/n to + 1:806//n. In either case we shall be 
right in about 95-45 per cent. of the cases. 

We note that in the first case the range is 4/1/n units and in the second case it is 
4-132/+/n units. Other things being equal, we should choose the first set of limits since 


GRAPHICAL REPRESENTATION 65 


they locate the parameter in a narrower range. We shall consider this point in more 
detail below. It does not always happen that there is an infinity of possible confidence 
limits or, if there is, that any simple rule of choice between them can be formulated. 


Graphical Representation 


19.6. In a number of simple cases, including that of the previous example, the con- 
fidence limits can be represented in a useful graphical form. We take two orthogonal 
axes, OX relating to the observed @ and OY to pb (see Fig. 19.1). 


ET Gee Oale 


The two straight lines shown have as their equations 
f= + 2, p=zt— 2, 
Consequently, for any point between the lines, 
ae 


Hence, if for any observed we read off the two ordinates on the lines corresponding to 
that value we obtain the two confidence limits. The vertical interval between the limits 
is the confidence range (shown in the diagram for = 1), and the total zone between the 
lines is the confidence belt. We may refer to the two lines as the Upper and Lower 
Confidence lines respectively. 

This example relates to the somewhat trivial case n = 1. For different values of n 
there will be different confidence lines, all parallel to 4 = %. They may be shown on a 
single diagram for selected values of n, and a figure so constructed provides a useful method 
of reading off confidence limits in practical work. 

A.S.—VOL, I. F 


66 CONFIDENCE INTERVALS 


Central and Non-central Intervals 


19.7. In Example 19.1 the sampling distribution on which the confidence intervals 
were based was symmetrical, and hence, by taking equal deviations from the mean, we 
reached equal areas of the frequency function as %) and «,. In general we cannot achieve 
this result with equal deviations, and subject always to the condition «, + «,—l=« 
the two quantities may be chosen arbitrarily. 

If «, and «, are taken to be equal, we shall say that the intervals are central. In such 
a case we have ae 


P (i <0) P10 <a) ee a. netotas 


In the contrary case the intervals will be called non-central. 


19.8. In the absence of other considerations it is usually convenient to employ 
central intervals, but circumstances sometimes arise in which non-central intervals are 
more serviceable. Suppose, for instance, we are estimating the proportion of some drug 
in a medicinal preparation and the drug is toxic in large doses. We must then clearly 
err on the safe side, an excess of the true value over our estimate being more serious than 
a deficiency. In such a case we might prefer to take «, very near to unity or even equal 
to unity, so that 

P(@ <#) =1 
P(t, <0) a, 
and we are certain that @ is not greater than #,. 

Again, if we are estimating the proportion of viable seed in a sample of material that 
is to be placed on the market, we are more concerned with the accuracy of the lower limit 
than that of the upper limit, for a deficiency of germination is more serious than an excess 
from the grower’s point of view. In such circumstances we should probably take «, as 
large as conveniently possible so as to be nearer to certainty about the minimum value 
of viability. This kind of situation often arises in the specification of the quality of a 
manufactured product, the seller wishing to guarantee a minimum standard but being 
much less concerned with whether his product exceeds expectation. 


19.9. On a somewhat similar point, it may be remarked that in certain circum- 
stances it is enough to know that P{t, <6 <¢#, | 6} exceeds some quantity «. We then 
know that in asserting 0 to lie in the range ¢, to ¢, we shall be right in at least a proportion 
a of the cases. Mathematical difficulties in ascertaining confidence limits exactly for 
given a, or theoretical difficulties when the distribution is discontinuous may, for example, 
lead us to be content with the inequality rather than the equality of (19.3). 


Example 19.2 


To find confidence intervals for the parent proportion o of successes in sampling for 
attributes. 

In samples of 7 the distribution of successes is given by the binomial (y + a)". We 
will determine the limits for the case n = 20 and confidence coefficient 0-95. 

We require in the first instance the distribution function of the binomial, which is 
obtainable from Table 5.2 (vol. I, p. 119). Summing the number of successes and dividing 
by 10,000, we find from that table the following :— 


CENTRAL AND NON-CENTRAL INTERVALS 67 


( Proportion of 
Successes D=0-1 w= 0-2 al = (O08) w= 0-4 w= 0-5 
Pp 

0-00 0-1216 - 0-0115 0-0008 — = 
0-05 0-:3918 0-0691 0:0076 0-0005 — 
0-10 0:-6770 0:2060 0:0354 0-0036 0:0002 
0-15 0-8671 0:4114 0:1070 0-0159 0:0013 
0-20 00-9569 0-6296 0:2374 0-0509 0:0059 
0-25 0-9888 0-8042 0:4163 00-1255 0:0207 
0:30 0:9977 0:9133 0-6079 0:2499 0-0577 
0:25 0:9997 0:9678 00-7722 0:4158 0-1316 
0-40 1-0001 0-9900 0-8866 0-5955 0:2517 
0:45 1-0002 0-9974 0:9520 0:7552 0-4119 
0-50 — 0-:9994 0:9828 0-8723 0:5881 
0-55 _— 0-9999 0-9948 0:9433 0-7483 
0-60 —_ 1-0000 0:9987 0:9788 0-8684 
0-65 — — 0:9997 0:9934 0:9423 
0-70 —_ — 0:9999 0-9983 0-9793 
0:75 — = — 0-9996 0-9941 
0-80 — — — 0-9999 0:9987 
0-85 = a= — — 0:9998 
0-90 a = — — 1-0000 
0°95 — a _— —_— — 


The final figures may be a unit or two in error owing to rounding up, but that need 
not bother us to the degree of approximation here considered. Values for o = 0-6 to 0-9 
may be obtained by symmetry. 

We note in the first place that the variate p is discontinuous. On the other hand 
we are prepared to consider any value of o in the range 0 to 1. For given @ we cannot 
in general find limits to p for which « is exactly 0-95 ; but we will take p to be the nearest 
multiple of 0-05 which gives confidence coefficients at least equal to 0-95, so as to be on 
the safe side. We will consider only central intervals, so that for given w we have to find 
, and p, such that 


P {@ > po} > 0-975 
P {wo <pi}.> 0-975, 


the inequalities for P being as near to equality as we can make them. 

Consider the diagrammatic representation of the type shown in Fig. 19.1 and given 
for our present case in Fig. 19.2. 

From the table we can find, for any assigned a, the values @, and a, such that 
P (p>) > 0-975 and P (p <@,)> 0-975. Note that in determining a, the distribution 
function gives the probability of obtaining a proportion p or less successes, so that the 
complement of the function gives the probability of a proportion 1 — p — 0-05 or less 
(not 1 — p). Here, for example, on the horizontal through w = 0-1 we find a, = 0 and 
@, = 0:30 from our table ; and for a = 0-4 we have w = 0-15 and ow, = 0:65. The points 
so obtained lie on stepped curves which have been drawn in. The zone between them is 
the confidence belt. For any p the probability that we shall be wrong in locating o inside 
the belt is at the most 0:05. We determine p, and p, by drawing a vertical at the given 
value of p on the abscissa and reading off the values where it intersects the curves. That 
these are, in fact, the required limits will be shown in a moment. 


68 CONFIDENCE INTERVALS 


We could have found more precise confidence limits by interpolating in the table 

obtained above. For example, with p = 0:30 we see that 

for 7 = 0-1, P = 0-9977 

for wo = 0-2, P = 0-9133. 
Hence, for P = 0-975 we have approximately 

9977 — 9750 
= OL 4 ene 
@= 01+ do77— o133 

and closer approximations can be obtained if desired. The corresponding point on the 


Values of p 
Fie. 19.2. 


lower confidence line to o, = 0:127 is p = 0:35. Calculations on these lines give us the 
values of @ such that 


P{p.o <@ <pi} =a exactly, 
whereas the former approach gave values such that 


P {po <@ <p,} =« approximately, 
>a in any case. 
Discontinuous variates usually give rise to this sort of arithmetical nuisance, but the 
approximation in practice is sufficiently good, except for very small samples. The broken 
curves in Fig. 19.2 give the more precise limits. They lie, of course, inside the more 
approximate step-curves. 

It is, perhaps, worth noticing that the points on the curves of Fig. 19.2 were constructed 
by selecting an ordinate w and then finding the corresponding abscissae w, and a. The 
diagram is, so to speak, constructed horizontally. In applying it, however, we read it 
vertically, that is to say, with observed abscissa p we read off two values p, and p, and 
assert that p) <@ <p . It is instructive to observe how this change of viewpoint can 
be justified without reference to Bayes’ postulate. 


CONFIDENCE INTERVALS FOR LARGE SAMPLES 69 


Consider Fig. 19.3, which shows a pair of confidence lines for the binomial. Let a’ 
be a given value of o and let the horizontal through w’ meet the confidence lines in points 
with abscissae w and a, Then we know that in repeated samples from a population 
with parameter ow’ a proportion « will give observed values of p lying between a, and a, ; 
for the curves were constructed so that this should be so. 

Now since the horizontal at a’ lies entirely within the confidence belt for a <p <a, 
(and does so for any a’), it follows that the assertion that ow’ lies in the belt is correct if, 


Values 
of y 
o w 


Values of p 


Fie. 19.3. 


and only if, p lies between a, and @,, that is in a proportion « of the cases. This, being 
true for any a’, is true for all o’, irrespective of the relative frequency of occurrence of the 
z’s under estimate. Consequently our assertion that a lies in the confidence belt is correct 
in a proportion « of the cases; and, in particular, for any observed p we may assert that 
a lies within the ordinates determined on the two curves by the vertical through p. 


Confidence Intervals for Large Samples 
19.10. In our usual notation, the logarithm of the likelihood function gives 


log L = D> bs t (z,, 6), A : é . (19.5) 
j=1 


é log L _ yblogf 


. ‘op Beale 
He 36 20 (19.6) 


We may regard ae Z as a random variable, and in particular write— 


nA = var (T3"), 


] ; 
so that A =var(* apt) ° oe , os (19:7) 


70 CONFIDENCE INTERVALS 
0 log L 

06 
v(nAy 
Then, for large samples, y will be distributed normally in the limit with unit variance, in 
virtue of the Central Limit Theorem, under very general conditions. It will also have 


zero mean, since 
d log f 1 of 
=H a= 
BH) -#G5) 


-[_ ar = = 3 | fe 


= l1= » 9 a ERGs: 
<1 =0. (19.9) 


Hence, from the distribution of y we may easily determine confidence limits for 6 in large 
samples if y is a monotonic function of 0, so that inequalities in one may be transformed to 
inequalities in the other. 

of 


60 
exists for all x, except perhaps at isolated points, that the range is ae eee of 0 and 


Write y= (19.8) 


It is sufficient (but not necessary) for the existence of the normal limit to y that 


that the Central Limit Theorem applies (e.g. if the third moment ae it exists). We 


also assume, as usual, that differentiation under the integral sign, as in Pg is legitimate. 


Example 19.3 
Consider again the problem of Example 19.1. We have, with uw for 0, 


fn) = San exp {- 2 —m))} 
é log f 
Ou = 
dlogf\ sl - i 
var ( Bu )=aa | @-0 fdx 
a= ll. 
Hence = J (F5£)-e- BL) /n 


is normally distributed with unit variance for large n. (We know, of course, that this 
is true for small 7 as well in this particular case.) The confidence limits may then be set 
as in Example 19.1. 


Example 19.4 
Consider the Poisson distribution whose general term is 


LZ p—A 
f(w, 2) = 28 
we: 


SHORTEST SETS OF CONFIDENCE INTERVALS 71 


We have 
Ologf_@ , 
Cl? “a 
dlogf\  \1/% ae 
var ( 32 y= D(G-2) e€ 


d(x) —n 
Hence y= AAG. = [2@-2) 


For example, with « = 0-95, corresponding to a normal deviate + 1:96, we have, for the 
central confidence limits, 


giving, on solution for 4, 


mie. c/s +3), 


the ambiguity in the square root giving upper and lower limits respectively. 
To order n~? this is equivalent to 


ame i96 [F 
n 


from which the upper and lower limits are seen to be equidistant from the mean Z, as we 
should expect. 


Shortest Sets of Confidence Intervals 


19.11. It has been seen in Example 19.1 that in some circumstances at least there 
exist more than one set of confidence intervals, and it is now necessary to consider whether 
any particular set can be regarded as better than the others in any useful sense. The 
problem is analogous to that of estimators, where we found that in general there are many 
different estimators for a parameter, but that we could sometimes find one (such as that 
with minimum variance) which was superior to the rest. 

In Example 19.1 the problem presented itself in rather a specialised form. We found 
that for the intervals based on the mean « there were infinitely many sets of intervals 
according to the way in which we selected a, and «, (subject to the condition that 
%& +o, =1-+2a). Among these the central intervals are obviously the shortest, for a 
given range will include the greatest area of the normal curve if it is centred at the mean 
of the curve. We might reasonably say that the central intervals are the best among 
those determined by 2. 

But it does not follow that they are the shortest of all possible intervals, or even that 
such a shortest set exists. It might also happen that for two sets of intervals c, and c, 
those of c, are shorter than those of c, in part of the range of ~’s and longer in other parts. 


72 CONFIDENCE INTERVALS 


19.12. We will therefore consider sets of intervals which are shortest on the average. 
That is to say, if 
6 = t, 7 lim 
we require that 


fe dF =minimum, . : : 5 . (19.10) 


where the integral is taken over all x’s and is therefore equivalent to 


|... [7 OBS |. asa 


We now prove a theorem which is very similar to the result that maximum-likelihood 
estimators in the limit have minimum variance, namely that in a certain class of intervals 
the method of 19.10 gives those which are shortest on the average. 

Let h (x, 9) be a function which has a zero mean value and is such that the sum of 
a number of similar functions obeys the Central Limit Theorem. Then 


a 
c= "aR 2 : : : . (19:12) 


is normally distributed in the limit with zero mean and unit vamance. y of equation 
(19.8) is a member of the class ¢. We prove that the average rate of change of y with 
respect to 0, for each fixed 6, is greater than that of any ¢ except in the trivial case 


_ _elogf 
a 6 
Writing g (x, 0) = ae we have 

oy 1 oat Ovarg 

06 4/(n var g) 12% 2 var g sar 06 \ Shes) 

of ey oh L d var h 

060 4/(n var om 00 var h = 06 \ ‘ - (19.14) 
Hence 

oy\ _ i og\ H dvarg 
BS) ese 122 (8) 2 var g 2G) \ 
Now £ (g) = 0 and 
0g 0? log f 0 log f \? 
E = — —<—__ => — 
(a) B 06? ) n( FT) ) 
SSE ye 
Thus 
E oy Bees nE (g?) 
06 4/(n var g) 
=—V(nvarg)=Ar say. . . . — . (19.15) 

Similarly, 


oc -,/ n Oh\ _ 
B (5) = wan? (H) = 4» say. . : . (19.16) 


SHORTEST SETS OF CONFIDENCE INTERVALS 73 
Since EH (h) = 0 we have 


(a) = Jatee~— Sete 


= — cov(h,g). . ; : etl 9:17) 
Hence 
oy oe me UL 2 (J 
Al — Az = 7 varg meg COV. (h, g) 
=". — cov? eee dos 
=a {var h var g — cov? (h, g) } ( ) 


Thus, unless A is a multiple of g, we have 
Aj > Aj, 
which was to be proved. 
Now if yp, is a value such that 
1 br 
Jam), oe = 
the upper and lower confidence points for central intervals are + y, and the values of 0 
are the solutions of 


So ie ae (19.19) 
say t and ¢,. Similarly those for any function h are given by 
ares =9, woe. 2. geo) 
say WU) and uw, The equations for confidence points are equivalent to 
y(t) = + Ye 
C (u) = + Ya 


or, effectively, in large samples, by 


y (0c) + (t — 9) (2), Say, 


0 


ty tah =) (3), ea. 


where 6, is a fixed value of 6. When #t = 0, and u = 6, we have yp (6.) = € (8,). Hence 


@—0n( 3). =(u— 00 (55), p . «een 
a 


ow 
Now we have just shown that, on the average, 30 - 36" Hence, on the average, 
t ae 05 <a 1 — Oe 
and the confidence limits ¢ are closer together than those of any member of the class wu for 
any fixed value of 0. 


19.13. A comparison of the result we have just proved and the properties of maxi- 
mum likelihood estimators in the limit will show the close relation between confidence 
intervals and the theory of estimation developed in Chapter 17. In 17.27 we showed, 


74 CONFIDENCE INTERVALS 


d log L — 
by considering the quantity u = a , that any estimator ¢ which is in the limit 
distributed normally about the true value 6, cannot have a variance less than 

dlog f \?., 
1/n z( 56 ) ; 


and that the latter quantity, in the limit, is the variance of the maximum likelihood esti- 
mator. It attains the minimal value when w is constant over samples for which ¢ is constant. 

The theorem of 19.12 shows that on the average the intervals determined by the 
distribution of u are shorter than those based on any other function with a zero mean value 
(obeying the usual conditions as to continuity, etc.). Since the maximum likelihood 
estimator has minimum variance, we should expect that confidence intervals based on its 
distribution would be shorter than others ; and this we now see to beso. For if wis constant 
over samples of constant ¢, the distribution of u in all samples is equivalent to that of ¢. . 


Confidence Intervals and Sufficient Estimators 

19,14. Pursuing this line of thought, we are led to inquire whether sufficient esti- 
mators provide confidence intervals for finite samples and whether they have any minimal 
properties of the kind we have just established for large samples. 

It is easy to see that sufficient estimators do in fact provide confidence intervals. 
If ¢ is sufficient for 6, the likelihood function may be put in the form 


C=} CO) ree 2) 


and the distribution of ¢ and @ is 
af =f, (i, 0) di. : : ; . . (19.23) 


Given « we can then find é, and #, such that F (f, 0) = 1 — « and F (é,, 6) = «, and solve 
for 6 in terms of f) and a» or ¢, and a, as the case may be. This process will provide the 
inequalities of the type we require, a proposition which we shall prove formally below 
(19.25). 


Example 19.5 
In Example 17.8 we saw that 


Q 
Ry 
| 
a. 
8 
f=) 
A 


%ao, pol, 


dp =(™P\™ 
(77) F (np) is 
The distribution function of m = ae is the incomplete J-function 


Tn(mp) fm 
T'(np) ae np ~1), 


CONFIDENCE INTERVALS AND SUFFICIENT ESTIMATORS 75 


We then find the values of m corresponding to « and «, from the tables, and have 


P(m < mo) = % 
iP (m> my) — Ay, 
whence 


p {ie <6 <b =a +a — 1 
= a. 


19.15. The position in regard to minimal properties of confidence intervals based 
on sufficient estimators remains somewhat obscure, but one would expect some such proper- 


d log L 
00 


ties to hold even for finite m. Since u = is constant for constant ¢ when t is sufficient, 


the variance of w will be a function of the variance of ¢t. This, however, is not necessarily 
enough to establish the fact that the corresponding confidence intervals are shortest on the 
average. Itisimaginable that the confidence intervals derived from its distribution might 
be longer on the average than those of some other system. This seems rather unlikely, 
at least for the ordinary distributions of statistical theory, but apparently no proof has 
been given. 


19.16. Neyman (1937b) has proposed to apply the phrase “shortest confidence 
intervals” to sets of intervals defined in quite a different way. As it does not appear 
that such intervals are necessarily the shortest in the sense of possessing the least length, 
even on the average, we shall attempt to avoid confusion by calling them “‘ most selective.” 

Consider a set of intervals cy, typified by 6, obeying the condition that 


P{scO|O}=a, «©. ©. « .  « (19,24) 


where we write 5) ¢ 9—that is, 6, “‘ contains ” 6—for the more usual t, <0 < t, (t; — t) = 6,). 
Let c, be some other set typified by 6, such that 


P{6,cO|0}=a . «. «.  .  « (19.25) 


Either set is a permissible set of intervals, as the probability is « in both cases that the 
range 6 contains 6. 
If now for every c, we have, for any value 6’ other than the true value, 


P {6)c0'|0} <P {6,c0’|6}, . . |. . (19.26) 


Cy is said to be most selective. 


19.17. The ideas underlying this definition will be clearer from a reading of Chapters 
26 and 27 dealing with the Neyman-Pearson theory of inference. We anticipate them here 
to the extent of remarking that the object of most selective intervals is to cover the true 
value with assigned probability «, but to cover other values as little as possible. We may 
say of both c, and c, that the assertion 6c 6 is true in proportion « of the cases. What 
marks out ¢, for choice as the most selective set is that it covers false values less frequently 
than the remaining sets. 

The difference between this approach and the one leading to shortest intervals is that 
the latter is concerned only with the narrowness of the confidence interval, whereas the 
former gives weight to the frequency with which alternative values of # are covered. One 


76 CONFIDENCE INTERVALS 


concentrates on locating 9 with the smallest margin of error ; the other takes into account 
the desirability of excluding so far as possible false values of 6 from the interval, so that 
mistakes of taking the wrong value are minimised. 


19.18. Neyman himself has shown that most selective sets do not usually exist (for 
instance, if the distribution is continuous) and has proposed two alternative systems :— 


(a) most selective one-sided systems (Neyman’s “ shortest one-sided ”’ sets) which 
obey (19.26) only for values of 6’ — 6 which are always positive or always negative ; 

(b) selective unbiassed systems (Neyman’s “short unbiassed ” sets) which obey 
(19.25) but, in place of (19.26), the further relation 


P{6c0|O}=a>P{dcO|O}. . « «  . (19.27) 


In essence these sets amount to a translation into terms of confidence intervals of 
certain ideas in the theory of tests of significance, and we may defer consideration of them 
until Chapters 26 and 27 are reached. 


Generalisation to the Case of Several Parameters 


19.19. We now proceed to generalise the foregoing theory to the case of several 
parameters. Although, to simplify the exposition, we shall deal in detail only with a single 
variate, the theory is quite general. We begin by extending our notation and introducing 
a geometrical terminology which may be regarded as an elaboration of the diagrams of 
Figs. 19.1 and 19.2. 

Suppose we have a frequency function of known form depending on J unknown para- 
meters, 6, . . . 6,, and denoted by f(z, 6, . . . 6,). We may require to estimate either 
6, only or several of the 6’s simultaneously. In the first place we consider only the estima- 
tion of a single parameter. To determine confidence limits we require to find two functions 
u, and u,, dependent on the sample values but not on the 6’s, such that 


P {ty <0, <u, (6; ... 0} =a, . «© 5 shoes 


where « is the confidence coefficient chosen in advance. 

With a sample of n values, x, . . . x, we can associate a point in an n-dimensional 
Euclidean space, and the frequency-distribution will determine a density function for 
each such point. The quantities u, and wu,, being functions of the x’s, are determined in 
this space, and for any given « will lie on two hypersurfaces (the natural extension of the 
confidence lines of Fig. 19.1). Between them will lie a Confidence Zone or Region of 
Acceptance. 

In general we also have to consider a range of values of 6 which are a priori possible. 
There will thus be an /-dimensional space of 6’s subjoined to the n-space, the total region 
of variation having (J + n) dimensions; but if we are considering the estimation of 6,, 
this reduces to an (n + 1)-space, the other (J — 1) parameters not appearing as variables. 

We shall call the sample-space W and denote a point whose co-ordinates are aw, . . . a, 
by #. We may then write u, (HZ), u, (HZ) to show that the confidence functions depend 
on £. The interval u; (Z) — wu (LZ) we denote by 6 (£) or 6, and as above we write 6c 0, 
to denote uw <6, <wu,. The region of acceptance or confidence zone we denote by A, 
and may write He6 or E ¢ A to indicate that the sample-point lies in the interval 6 or 
the region A. 


GENERALISATION TO THE CASE OF SEVERAL PARAMETERS ae 


19.20. In Fig. 19.4 we have shown two axes x, and x, and a third axis corresponding 
to the variation of 6,. The sample-space W is thus two-dimensional. For any given 
0,, say 6), the space W is a hyperplane (or part of it), one such being shown. 


Q Q’ 


Fig. 19.4. 


Take any given pair of values (2,, x.) and draw through the point so defined a line 
parallel to the 6,-axis, such as PQ in the figure, cutting the hyperplane at R. The two 
values of uw. and w, will give two limits to 9, corresponding to two points on this line, say 
U, V. Consider now the lines PQ as x,, x, vary. In some cases U, V will lie on opposite 
sides of R, and 6, lies inside the interval UV. In other cases (as for instance in U’V’ shown 
in the figure) the contrary is true. The totality of points in the former category deter- 
mines the region of acceptance A, shaded in the figure. If for any point in A we assert 
6 ¢ 0;, we shall be right ; if we assert it for points outside A we shall be wrong. 


19.21. Evidently, if the sample-point E falls in the region A, the corresponding 
6, lies in the confidence interval and conversely. It follows that the probability of any 
fixed 6, lying in the confidence interval is the probability that E lies in A (8;); or in 
symbols— 
Pore, | 0, ee CP 1 t6 20, ae, 0, o 8 6, } 

=P{HeA(0,)|60,... 6;}. . (19.29) 
From this it follows that if the confidence functions are determined so that 
P{uU <6, <u, |, oe e a; =a 


we shall have, for all 4,, 
eH ewt(es)|\0,. ~ . 0,} =a, Z ‘ ‘ . (19.30) 


It follows also that fur no §, can the region A be empty, for if it were the probability in 
(19.30) would be zero. : 


78 CONFIDENCE INTERVALS 


19.22. If the functions uw, and w, are single-valued and determined for all H, then 
any sample-point will fall into at least one region of acceptance. For on the line PQ cor- 
responding to the given H we take an Rk between U and JV, and this will define a value of 
6,, say 6,, such that He A (6)). : 

More importantly, if a sample-point falls in the regions A (0,) and A (0{) correspond- 
ing to two values of 0,, 6; and 6}, it will fall in the region A (0,), where 6; is any value 
between 6, and 6;. For we have 


ig <0 Ue, Ue ee On ments: 
and hence wy <0, <0, <u 
if 6; is the greater, and hence 
Ugh <0, Sy ee 0, ma Hy 
or uy <0; <u 


Further, if a sample-point falls in any of the regions A (0,) for the range of 9-values 
6, <6, <6{, it must also fall within A (6,) and A (6;). 


19.23. The conditions referred to in the two previous sections are necessary. We 
now prove that they are sufficient, that is to say: if for each value of 6, there is defined 
in the sample-space W a region A such that 

(1) P{# « A (6,)|9:} =«, whatever the value of the 6’s; 

2) For any F there is at least one 6,, say 6;, such that E « A (0,); 

3) If He A(6,) and Ee A (6;), then HE « A (6,) for any 6; between 6; and 67; 

4) If He A(6,) for any 6, satisfying 0; <0, <6}, He A(O,) and He A (65); 
then w, and u,, viz. confidence limits for 6, are given by taking the lower and upper bounds 
of values of 0, for which a fixed sample-point falls within A (6,). They are determinate 
and single-valued for all H, uw» <u,, and P{u, <0, <u,|6,} =« for all 6,. 

The lower and upper bounds exist in virtue of condition (2), and the lower is not greater 
than the upper. We have then merely to show that P{u. <6, <wu,|6,} =«, and for 
this it is sufficient, in virtue of condition (1), to show that 

Piu, <6; <4,|0,} =P lL eel. : « (19:21) 
We already know that if H « A (6,) then uy <6, <wu,; and our result will be established 
if we demonstrate the converse. 

Suppose it is not true that when wu <6, <u, He A(6,). Let E’ be a point outside 
A (6,) for which u, <0, <u,. Then must either uw) = 6, or u, = 6, or both; for other- 
wise wu) and u, being the bounds of the values of 6, for which £ lies in A (6,), there would 
exist values 0, and 6), such that H ¢ A (0,) and Ee A (6;) and 


( 
( 
( 


Uo 0, 20, =< 0; < Uy, 


so that, from condition (3), H ¢ A (6;) which is contrary to assumption. 

Thus w> = 9, or u, = 0, or both. If both, then ZH must fall in A (6,), for wu) and wu, 
are the bounds of 6-values for which this is so, and if they coincide their common value 
must be so. Finally, if uw) = 6, <u, (and similarly if wu) <6, = 14) we see that for 
Uo < 0, < u,, H must fall in A (6,) from condition (3), and hence, from condition (4), E 
must fall in A (6;) and A (0;) where 6; = uw. and 6; = u,. Hence it falls in A (6,). 


19.24. The foregoing theorem gives us a formal solution of the problem of finding 
confidence intervals in the general case, but it does not provide a method of finding the 


STUDENTISATION 79 


intervals in particular instances. In practice we have three lines of approach: (1) to use 
sufficient estimators, (2) to adopt the process known as “ studentisation,” and (3) to 
“guess ” a set of intervals in the light of general knowledge and experience and to verify 
that they do or do not satisfy the required conditions. 


19.25. Consider the use of sufficient estimators in the general case. If ¢, is sufficient 

for 6, we have 
Eee Ey (ts, 0.) Ly eee eee Oa ey O)). ; : » (19532) 
The locus ¢, = constant determines a series of hypersurfaces in the sample-space W. If 
we regard these hypersurfaces as determining regions in W, then t, <k, say, determines 
a fixed region K. The probability that EH falls in K is then clearly dependent only on 
t, and 6,. By appropriate choice of k we can determine K so that 
PAE 0) oe, 

and hence set up regions of acceptance based on values of t,. We can do so, moreover, 
in an infinity of ways, according to the values selected for « and «. 


Studentisation 


19.26. In Example 19.1 we considered a simplified problem of estimating the mean 
in samples from a normal population with unit variance. Suppose now that we require 
to determine confidence limits for the mean yw in samples from 


agony ao") J 


The approach of Example 19.1 would lead us to the conclusion that, for confidence coefficient 
0:9545 and central intervals, 


20 20 
~ 20 <9 ee, 
Pia wat SRR gl re 


But we cannot now say that the confidence iimits are + 20/+/n because o is unknown. 


= 0-9545, 


Consider then the distribution of z = =—f , where s* is the sample variance. This 


is known to be the ‘“ Student ’’ form 


ie e207 
(1 + 2?)2 


(Cf. Example 10.6, vol. I, p. 239.) Given «, we can now find z, and z,, such that 


— 21 oe l—«2 
|_ a ={ ar - = 


and hence 
P{—2z, <z <a}=a, 
which is equivalent to 
P{é— 8% <p <k + 8z,} =a. 
Hence we may say that wu lies in the range  — sz, to + sz, with confidence coefficient 


a, the range now being independent of either yu or o. In fact, owing to the symmetry of 
“‘ Student’s ” distribution, z, = 21, but this is an accidental circumstance peculiar to the 


present case. 


80 CONFIDENCE INTERVALS 


19.27. The possibility of finding confidence intervals in this case arose from our 
being able to find a statistic z, depending only on the parameter under estimate, whose 
distribution did not contain o. A scale parameter can often be eliminated in this way, 
although the resulting distributions are not always easy to handle. If, for instance, we 
have a statistic t which is of degree p in the variables, then ¢/s? is of degree zero, and its 
distribution must be independent of the scale parameter. When a statistic is reduced 
to independence of the scale in this way it is said to be “ studentised,”’ after “‘ Student ” 
(W. 8. Gosset), who was the first to perceive the significance of the process. 


19.28. It is interesting to consider the relation between the studentised mean- 
statistic and confidence zones based on sufficient estimators in the normal case. The 
distribution of means and variances in normal samples is 


n Re Deke _3 ns? 
dF = ree exp 1 552 -- | dz =r ce exp(— _ ds? . (19,38) 


and Z, s are jointly sufficient for 4, o. In the sample space W the regions of constant Z 
are hyperplanes and those of constant s are hyperspheres. If we fix and s the sample- 
point # lies on a hypersphere of (x — 2) dimensions. Choose an area on this hypersphere 
of content «. Then the acceptance region will be obtained by combining all such areas 
for all @ and s. 

One such region is seen to be the “slice”’ of the sample-space obtained by rotating 
the hyperplane passing through the origin and the point (1, 1. . . 1) through an angle 
za (not 27x because a half-turn of the plane covers the whole space). 

The situation is illustrated for n = 2 in Fig. 19.5. 


ce 3 


Fic. 19.5. 


For any given y’ the axis of rotation meets the hyperplane w = yw’ in the point 


2, = %, = yw’, and the hypercones = = P= constant in the W space become the plane 


te er 


STUDENTISATION 81 


areas between two straight lines (shaded in the figure). These may be regarded as regions 
of acceptance, and one set is that obtained by rotating a plane about the line x, = 2, = bs 
through an angle so as to cut off in any plane » =p’ an angle * on each side of 
@,— pW =%,— pl’. 
The boundary planes are given by 


=(c- La 
vy — uw = (x, — p) tan € 4 
x, — pw = (x, — pw) tan (F is 


where 6 = 2(1 —«); or, after a little reduction, 


X, + XL, C1 — Xe B 
= CO = 
: g 1 9 D 


x x x, — 2 
_%1 +22 Mee 28 ot 


2 2 De 


Lt 


uw then lies in the region of acceptance if 


XL, + 2, | % — 2, | B “+ 2%, |v, — a, | B 
7 9 cot 5 SBS 3 a5 3 cot 3 
These are in fact the limits given by “ Student’s ” distribution for n = 2, since the sample 
: ay — 2, * 
variance then becomes . —— 


| and 


so that 2) = ten (5 — 5) = cot B 


19.29. Tables or diagrams of the confidence intervals for selected values of « have 
been given for the following parameters :— 

(a) the proportion @ in the binomial (Clopper and Pearson, 1934) ; 

(6) the parameter of the Poisson distribution (Garwood, 1936; Ricker, 1937) 

(c) the correlation coefficient in normal samples (David, 1938a) ; 

(d) the median in samples from any population (K. R. Nair, 1940b). 
In addition, results for the mean of a normal population may be obtained from ‘‘ Student’s ” 
integral as shown above. Those for the variance of a normal population may be obtained 
from the J-function or the equivalent y?-integral. For simultaneous estimation of mean 
and variance there are difficulties, as we proceed to show. : 


. 
2) 


19.30. It might have been expected that the foregoing theory could be generalised 
to give simultaneous pairs of confidence intervals for two unknown parameters when 
intervals for each separately cannot be found. Very little progress in this direction has, 
however, been made. The difficulty may be illustrated by reference to the joint distri- 

A.S.—VOL. IL. G 


82 CONFIDENCE INTERVALS 


bution of mean and variance (19.33). From the independent distributions of Z — w and 


e we can, given «, f, find t, t, and Up, % such that 
a P 


P{-4<2= <t) =a 


where the ¢’s and u’s depend only on sample values and «, 8 may be chosen at will. The 
inequalities are equivalent to 


i ty <u eo, . ee (19) 
Seget. 2) ae 
U4 Uo 
and these give 
ce] ", <a rn 
ho Uy 
But can we then infer that ; 
Pla— 2s <p <éttal—y, . . Cdeea 
Uo Uy 


where y is a constant dependent on « and 6? We cannot. This equation is, in fact, 
not generally true. The fact can be verified by considering the distribution of the statistic 
& — ks and showing that its distribution function F (u) is not independent of w and o. 


19.31. In the next chapter we shall see that a similar problem, giving rise to Behrens’ 
test, provides a crucial point of difference between the theory of confidence intervals and 
that of fiducial intervals. All we need say here is that from the point of view of the former 
the problem of simultaneous confidence intervals for several parameters remains unsolved, 
except of course in the degenerate case when we can find independent intervals for each 
parameter separately. 


19.32. In conclusion we indicate without proof a few results which have recently 
been obtained. 

(1) Wilks and Daly (19396) have generalised the theorem of 19.12 to the case of several 
parameters. Under fairly general conditions the confidence regions which are shortest 
on the average are given by 


as {ov se a} Zs 


where (a,;) is the inverse matrix to that whose general element is 


E d log f d log f 
00; 00; 


and x7, is such that P (y* <4) =a, the probability being calculated from the y2-distri- 
bution with» = 1. This is clearly related to the result of 17.46 giving the limiting forms 
of variances and covariances of maximum likelihood estimators. 

(2) Wald (1942) has considered the problem of large samples from the point of view 
of most selective sets (“‘ shortest’ in Neyman’s sense) and has proved results somewhat 
similar to those of Wilks and Daly. 


NOTES AND REFERENCES—EXERCISES 83 


(3) Wald and Wolfowitz (1939), 1941c) and Kolmogoroff (1941) have considered the 
problem of setting confidence limits to the terminals of an unknown frequency-distribution. 


NOTES AND REFERENCES 


When the theory of confidence intervals and that of fiducial intervals were first devel- 
oped many statisticians regarded them as equivalent. In papers written between 1930 
and 19388 ‘‘ confidence limits’”’ and ‘“ fiducial limits’ are often used in the same sense ; 
and even where a distinction of approach was drawn the results given by the two methods 
appeared identical. The case of Behrens’ test, however, provided an illustration where 
the methods lead to different results—see the following chapter. 

The fiducial approach is due to R. A. Fisher, references being given at the end of 
Chapter 20. The approach of the present chapter has been developed mainly by Neyman 
(see particularly 1937b), E. S. Pearson, Wilks (1938), c, 1939a and—with Daly—1939b), 
Wald (1939a, 1942), Welch (1939a), and Bartlett (1936a, 1939a)._ A number of the references 
to Chapters 26 and 27 are also relevant. 

Confidence intervals can be obtained for the median and other quantiles which are 
independent of the form of distribution. See Thompson (1936), Savur (1937a) and K. R. 
Nair (19406), and compare Exercise 19.5. 


EXERCISES 
19.1. Show that for the rectangular population 
af — 2s Ogee 0 


and confidence coefficient «, confidence limits for @ are ¢ and ¢/y where t is the sample range 
and yp is given by 
yl{n—(n—l)ps=1l—a. 
(Wilks, 1938c.) 


19.2. Show that, for the distribution of the previous exercise, confidence limits 
for samples of two, x, and 2, are 
Hy + 2, Hy + Xe 


l+f(l—«) 1—vy(l—2«) 


(Neyman, 19375.) 


19.3. Show also, in the case of the previous exercises, that if L is the larger of a 
sample of two, confidence limits are 


L, a 


/(1 — «) 
(Neyman, 19375.) 


Show further that if M is the largest of samples of four, confidence limits are © 


M 


i Aaa 


(For an experimental verification, see Frankel and Kullback, 1940.) 


84 CONFIDENCE INTERVALS 


19.4. Show that, for the distribution 


dF =6e-* dz, 0<xr<0 
central confidence limits for large samples with « = 0-95 are given by 
Tee nue 
6 = 2 
5 : 


(Wilks, 1938c.) 


19.5. Ifa frequency function is continuous, the probability that the kth of a sample 
of n (arranged in ascending order of magnitude) lies in the range dz is 
| k-1 = 
SS | = n—k 
Bikn =k + ye ( cS eee 
where F is the distribution function. Deduce that 
where M is the median, and hence show how to determine confidence intervals for M from 
the incomplete B-function. 
Generalise the result for quantiles. Show that the results do not hold for discon- 


tinuous distributions. 
(Thompson, 1936.) 


CHAPTER 20 
FIDUCIAL INFERENCE 


20.1. We now proceed to examine a type of inference known as fiducial. As in 
other methods of estimation, given a distribution of known form depending on an unknown 
parameter 6, we shall attempt to find limits between which 6 lies in some sense associated 
with the theory of probability. To that extent our present approach is similar to the 
use of estimators with their associated sampling error and to the use of confidence intervals ; 
but it is distinct from the latter both in essential ideas and in some of the results to which 
it leads. 


20.2. Consider samples of » from a normal population of unknown mean yw and 
unit variance. The sample-mean Z is sufficient for uw and its distribution is 


n ae ile 
iF = [Pep {-he-w hag. ‘ ; » (20a) 


In speaking of a distribution in this sense we regard yw as fixed and consider the totality 
of values of @ derived by random sampling from the population with given u. The pro- 
portion of samples falling in a range dz is then given by (20.1), which holds for each 
value of pu. 

We now change our viewpoint and consider a different kind of distribution based on 
(20.1). If we are given a value of < from a sample, what are the values of w which could 
have given rise to this value to any fixed level of probability ? If the deviation « — yp is 
written as h, we know that the probability of the inequality 


E—-pch . : : : : - (20.2) 


being true is «, where « depends on A and is in tact 


ie ix exp (—" Se ) de. et ur) 


Looking at this the other way round, we may say that given any « we can find h, a function 


of « only, such that 
ReE—h «Oe. ee ee 


is true with probability «. For any fixed @ this gives us a distribution of w. Consider 


in fact the equation 
p=eé—h. 4 : : ‘ a . (20:5) 


If w has a distribution function F (u), we have, since (20.4) is true with probability «, 


l—-a=F Wy=1-[ fz exw (-™) a, 
2 
foes = Je exp ( al a 


But in virtue of (20.5), du = —dh andh=y»—2z. Thus 


| a a rd 2 
fdu = [Ze exp (—"ES du . we (20,6) 
This is called the fiducial distribution of yu. 


85 


whence 


86 FIDUCIAL INFERENCE 


20.3. It so happens that in this example the non-differential parts of (20.6) and 
(20.1) are the same. This is not essential although it is not infrequent. The crucial 
point of difference, however, lies in the appearance of the differential element du, relating 
to the variation of uw, and the disappearance of d% relating to the variation of 7. We have 
derived a distribution of the parameter yw from that of the random variable < by trans- 
ferring our attention in (20.4) from # to w and regarding the inequality as still satisfied 
with probability «. 


20.4. We note in the first place that this distribution is not necessarily existent. 
When we come to make an inference in any particular case we do not assume that yu is 
itself distributed in the fiducial form in the sense that it has been chosen at random from 
an existent population of w’s of that form. Such a prior distribution, which would be 
required for the application of Bayes’ theorem, is not admissible from the point of view 
of the frequency theory of probability. The fiducial distribution is a hypothetical one of 
conceivable values of w. We attach probabilities to these values, or rather to values in the 
range du, by identifying them with the probabilities (based on frequency) which are derived 
from the distribution of a sufficient estimator of uw. For this reason the fiducial distribution 
is not a frequency-distribution in the ordinary sense; but it 7s a probability distribution 
in its own special sense. We use it to make statements of the kind: among the values 
of « which are possible, only those in a certain range give rise to the observed # with 
probability «, and hence we will locate w in that range. 


20.5. In our present example the argument would proceed as follows. From equation 
(20.6) and the use of the normal integral, the probability that ~ — x does not exceed a 
certain /# is ascertainable as a function of 4; for instance, 


zB {1 —£< = =)°O7 120. 

If we regard a probability as high as this as acceptable, we may say that w < # + 2/v/n. 

This result is equivalent to that given by the theory of confidence intervals, for if we 
assert uw < @ + 2/4/n we shall be right in the long run in 97-725 per cent. of the cases. This 
identity of result is found in most elementary cases where a single parameter is concerned, 
but is to be regarded as accidental. In the theory of confidence intervals it is fundamental 
(a) that the assertion as to the parameter lying in a given range should be true in an assigned 
_ proportion « of the cases, and (b) that no assumption need be made as to the prior dis- 
tribution of the parameter, either in the frequency sense or in the fiducial sense. In fiducial 
theory it is not necessary that (a) should be true, but the fiducial distribution is 
a fundamental part of the inference. 


20.6. There is a further distinction between the two theories. In that of confidence 
intervals it is possible to have two entirely different sets for the same parameter, and in 
fact part of that theory is devoted to finding “ best ’’ sets among the possible ones. In 
fiducial theory such a state of affairs must not be possible, for different limits would imply 
different fiducial distributions for the same parameter on the same evidence. This is avoided 
by confining fiducial distributions to those based on sufficient estimators, or more generally 
on a set of estimators which together avoid all loss of information. Since such estimators 
alone contain all the information relevant to the problem of estimation they alone can 
give the fiducial distributions accurately. It follows, of course, that where no sufficient 


FIDUCIAL DISTRIBUTIONS 87 


estimator—or estimator with complete set of ancillary estimators—can be found, the 
fiducial method is inapplicable. 


20.7. Generally, let F (6, ¢) be the distribution function of a sufficient estimator ¢ 
for a parameter 6. Then for the frequency distribution of ¢ we have 
OF (t, 0) 
ot 


dF = Geo. ke 


F (é, 6) is the probability that a random value of the estimator does not exceed a given 
value ¢. In accordance with the fiducial principle, this may be equated to the probability 
that for fixed ¢ the value of 6 will exceed t, so that for the fiducial distribution of 0 we have 


a 
eae Vt 
= {1 — F(t, 6) }d0 


_ __ OF (t, 8) 
=~ 00. 


This shows the general relation between the frequency-distribution of the estimator and 
the fiducial distribution of the parameter. 


ee) COS) 


Example 20.1 


If p is known, the estimator 6 =5 is sufficient for 0 in samples from 
— 7 <Ua< 
d PI (p) x, 0 <2 <0 


the distribution of 6 being, in fact, 
nN fin -1 H a 
Che (*) eco exp (—"") ad. 


6 I (np) 6 
(Cf. Example 17.8.) We may write this in the form 
“6 
npb \"?-1 oe io =) np 
dF = (= ) Tyran d ( 4 ) . : ~ (20.9) 
It is then clear that, since 
oF oF at 
en a 


the corresponding fiducial distribution of 6 is 


exp ( 2 aa 
§\"P-1 6 , a6 
ap = (72 pb > a Eero 

(5) rm ae a 
which may also be put in the form (20.9), provided that we interpret the differential element 


now as relating to 6 and not to 6. It will be noticed that we have replaced dé by ge 


not merely by dé. 
From the fiducial distribution (20.10) we can find the probability that @ lies in a certain 


range dependent on the observed § and the chosen probability «. This is in fact the same 
range that we should obtain by applying confidence intervals to (20.9). Once again the 
results of the two methods are the same. 


88 FIDUCIAL INFERENCE 


Fiducial Inference based on “ Student's” Distribution 

20.8. Consider now the estimation of the mean yu in samples from a normal popula~ 
tion with unknown variance o?. The treatment of 20.2 is no longer of use, for it would 
result in a fiducial distribution of « containing the unknown o. We therefore “ studentise af 
the problem by considering the distribution of 


po 2 —t) Ve onan 
s 
which is independent of o, being in fact 


ipo ee 


£2 \40+1” 
Oe, 
ay 


where »v =n —1. Here s’2 is the unbiassed estimate of the sample variance 


1 


— GAN 
ae £7: 


The distribution of ¢ may be written 


. (20.13) 


dF Cc — . e e ° (20.14) 
s'2(n — 1) 
In the usual way we can find two constants, for any given «, such that, from (20.14), 
P { po <u <i = a, . . . . . (20.15) 


the probability being based on (20.14) and therefore to be understood in the fiducial sense. 
Had we worked with (20.12) or (20.13) we should have found ¢,, t, such that 


P{—-t, <t mato Sy 
which is equivalent to 


Pie _ <p <e+ =} = o. - : - (20.16) 
This may be interpreted in the sense of confidence intervals, i.e. that in asserting the 
inequality in (20.16) we should be right in a proportion « of the cases in the long 
run. (20.15) does not rest on this statement as to frequency, though the limits to which 
it leads are the same and the statement happens to be true. 


20.9. The case we have just discussed raises a new point. Is it still true that 
the fiducial distribution is unique, and is it consistent with the distributions of uw and o 
separately ? The distribution is based only on the sufficient estimators # and s’ (which 
are jointly but not separately sufficient for 1 and o) and we should expect this to be, so. 


But the matter requires investigation, for we are here using a fiducial distribution based on 
two estimators. 


FIDUCIAL INFERENCE BASED ON “STUDENT’S” DISTRIBUTION 89 
The simultaneous distribution of < and s’ is 
1 Mere By = (x — 1) s’?) ds’ 
dF & = Seo Ty = ee 
ce 5 exp { 503 (@ wt} de (=) exp | oat \ - ( ) 
If we were considering fiducial limits for ~ with known o we should use the distribution 
u La ee 
dF «x sexe | 352 * — p) \ ae. 
If we were considering fiducial limits for o with known u we should not use the other factor 
in (20.17), 
(\ eae (n — 1) 8’?) ds’ 
Foce( = ee, ee cone 
reli) ioe |e lias 


for in such circumstances s’ is not sufficient for o, the appropriate estimator being 


i 
z &' (x — mw). The question is, what form of fiducial distribution must hold for o in order 


that the “‘ Student ” form (20.14) should hold for 4 when o is unknown ? 
Suppose the fiducial distribution is f(s’, o)do. We have then for the joint fiducial 
distribution of « and a, 


1 1 nes , . 
dF ce Zexp | ~ 35 (@— a)? } duf (a, 0) da. 


We have therefore to solve 


ince | _ Ox : = k du 

eee { 558 (H x) \ FG, a)da} du = ee (a — Bn) . (20.19) 

| 8’ (n — 1) 

where k is some constant. Putting (u — #)? =a, — 53 = £, we have then to solve 
| et (st we 3) 
0 2B B 1 4 Na 2 
(n — 1) 8’? 
: 1 

Regarding « as the complex quantity if we see that 3 f (’, yi — 5) is the frequency 


function whose characteristic function is 1 i { 1+ — Seal which gives 


51(¢.f-§) mse f@=2e a 


from which we find 


; 1 (xn — 1) 8"? 
f(s’, 0) © exp {- MS |, 
or, on evaluation of the constant, 
2 (n — 1) 8/2) #a-D (n — 1)8’?)\do 
t — Rel aa eal Sea eae : 9 ; 
f(s’, «) do { 352 } exp 292 - (20.20) 


(5) 


This, then, is the fiducial distribution which o must obey. We should have arrived at 


90 ‘FIDUCIAL INFERENCE 


the same result had we taken (20.18) and transformed it to the fiducial form, as if it related 
to s’ and o only and the former were sufficient for the latter. 

It appears, then, that in this case at least the fiducial method gives consistent results 
when two parameters are involved. The general problem of many parameters presents 
difficulties and has not been elucidated to any great extent. 


The Logic of Fiducial Inference 

20.10. The notion of fiducial probability was introduced by Fisher (1930) for the 
case of a single parameter. Regarding the estimate ¢ as fixed, Fisher considers the dis- 
tribution of values of 6 for which t can be regarded as a representative estimate—representa- 
tive, that is to say, in the sense that it could have arisen by random sampling from the 
population specified by 6. As pointed out above, this does not mean that we are regarding 
the true value of 6 as a member of an existing population. Rather, we are considering the 
possible values of 6 and attaching to each value a measure of our confidence in it, based 
on the probability that it could have given rise to the observed ¢. 

If I interpret him correctly, Fisher would regard a fiducial distribution as a frequency- 
distribution. This implies that 6 is regarded as a random variable. It appears to me, 
however, that it is not a random variable in the ordinary sense of the frequency theory 
of probability, in which values of 6 either are or can be generated by an actual sampling 
process. We can never test whether the fiducial distribution holds in the frequency sense 
by drawing a number of values and comparing observation with theory. Nor, in calcu- 
lating fiducial limits of the type 6 = t + h(a), do we imply that the proportion of cases 
for which 6 <#¢ +h is true will be « in the long run. 


20.11. The reader has a choice of several attitudes towards the foundations of the 
fiducial argument: (a) he can accept the argument as involving a new postulate of infer- 
ence ; (b) he can regard it as sanctioned by the approach of the previous section ; or (c) he 
can, so far as estimates based on a single parameter are concerned, console himself with 
the thought that the results of the process are the same as those given by the theory of 
confidence intervals. 


20.12. Although Fisher is careful to emphasise the distinction between his own 
approach and that based on Bayes’ postulate, it is interesting to note that the theory of 
inverse probability as modified by Jeffreys gives results which are in many cases identical 
with those of fiducial inference. 

In the example of 20.2, for instance, suppose that the prior distribution of y is f (uw) du. 
Then for any given < the posterior probability of yw is 


= ® oxp J" (e — 4)? 
ak = f (u) du Ve exp { 5 (% — p) \ ‘ : - (20.21) 
If the total probability is unity we have 
7 n ae 
J fu Ae exp - 5 = mw} dies : . (20,22) 


Clearly f(u) = 1 is a solution, and we may use characteristic functions to show that it is 
the only solution. In fact we have from (20.22), writing i for nt— 


7 HY ote g -/% af 
[2 exp ( ee I BSD on 


BEHRENS’ TEST 91 


2 
The expression on the right is the characteristic function of exp ( = ae } and hence 


f (v) exp (—"*) = exp (—™*), 
or f(u) = 1. 


We have, then, for the posterior probability distribution of y, 


meee ey 
ar = ne exp { 5 (Ht &) \ a A ; . (20.23) 


which is the same as the fiducial distribution. The requirement that f (u) = 1 is equivalent 
to a prior distribution of u, dF = du, which is the form given by Bayes’ postulate for a 
parameter which can extend to infinity in either direction. 


Example 20.2 
In Example 20.1, a similar argument leads to a prior distribution of 6, 


dé 
dF « 5° 


This is the form given by Jeffreys’ modification of Bayes’ postulate when a parameter 
can extend to infinity in only one direction. 

It does not appear, however, that fiducial and inverse probability always give the 
same results. Consider the distribution of the correlation coefficient in normal samples 
(14.14)— 


n—1 n—4 n—2 —-1/__ 
Ree pt) a (ir) = 3 ( et dr. . (20.24) 


dirp)P—* | Vl =p) 
The argument of the type we have just employed would require a prior distribution of p— 
dp 
SS earne 


and the resulting posterior distribution (which is equivalent to that obtained by inter- 
changing r and p in (20.24)) is not the same as we should get by using equation (20.8). 
Behrens’ Test . ; 
20.13. Suppose we have two samples of n, and n, members from normal populations 
with possibly unequal variances. The fiducial distributions of m, and wm, are of the 
“Student ” form (20.14). Writing temporarily in this and the next section s,? for 
U(x, — &,)*/nm,(n, — 1) and similarly for s,?, and putting 
fa = + een 
Ma = Hz + 82 Ug 
we have 
fa — fa = By — Fy + 5, Uy — 8g Ue ‘ . - (20.25) 
If now 


&, — &_ — (Ma — be) - kee (20.28) 
V (8:2 + 7)” 
e depends only on the known quantities % and s’ and the difference of means um, — [lo 
From the fiducial distributions of u, and yw, we can find that of e, and hence make fiducial 
statements of the type 
#, — By — &) 4/(3;? + 832) <p — fn SB — Hy + by V/ (8,2 + 897). « (20.27) 


= 


92 FIDUCIAL INFERENCE 


20.14. The distribution of ¢ is not of asimpleform. Putting tan y = = we see that 
1 


ée= el ae a cos yp oo wa = Ma sin Wy, 7 A a (20.28) 
8; 82 


so that ¢ is distributed fiducially as the weighted difference of two variables, each of which 
is distributed as ‘‘ Student’s” ¢. We have then to find the distribution of 

e =t, cos y — t, sin p 
where the joint distribution of #, and ¢, is given by 
dt, dt, 


2 tn, 2 ivan fy 
1+—1 1+—4 
Ny <= 1 Ne ane! iL 


The distribution has been studied by Sukhatme (1938)) and in more detail by Fisher 
(1941a). Tables are given for various values of ,, n, and the ratio s,?/s,? (or the equiva- 
lent angle wy) showing the values of « corresponding to given probability levels. Some of 
the tables are included in the second (1943) edition of Fisher and Yates’ Statistical Tables for 
Agricultural, Biological and Medical Research. 


dF . (20.29) 


20.15. The joint distribution of s,? and s,? is 


af cc 6," * e"-3 exp [= 4 (nm, — 1) * —4(n, —1) | ds,* ds,*. 
: 8,7 Phe 8,2 
Putting saan and wad{m— DE +m — 9 Sh, 
we find, on a little reduction, 
ag p(m — es rs en aa - (20.30) 
i. a) 


Thus w is distributed (independently of p) in the Type III form. Further, 
(Z; — 1) — (2 — fz) is distributed normally about zero mean with variance o? + o3. 


2 
Hence, if =" = 6, we find that the quotient 
2 


{(% — 4) am (X» — M2) }# (n, +m, — 2) rat i + p) (m1 +n, — 2) 
(ot +o {MOA | Oe 
ot 


‘ \ { (ms Al) n Ft (ieee) 
o% 6 

is distributed as ¢? with n, +. — 2 degrees of freedom. (Cf. Example 10.17, vol. I, 

p. 248, for the distribution of a normal variate divided by a Type III variate.) 

Now if we knew 0 we could find fiducial (or confidence) limits to ¢, and hence to [1 — Me, 
in the usual way, for the distribution of e would then be independent of unknown constants 
and ascertainable from “ Student’s”’ integral. Since, however, @ is not known, we require 
in turn the fiducial distribution of this quantity. Since 


N18," Ne 85” 
it ual 2°2 
OF oF) 


is distributed in Fisher’s form (cf. Example 10.18, vol. I, p. 249), the required fiducial 


(20.31) 


BEHRENS’ TEST 93 


form for 6 can be obtained from that of z, which incidentally is equivalent to that of p 
in (20.30). If we express (20.31) as the joint fiducial distribution of ¢ and 6 and integrate 
out for 6, we shall be left with an equivalent form to that derived from (20.29). 


20.16. It also follows from the above that the inequality (20.27) is not satisfied in 
proportion « of the cases independently of 6, so that the limits to u, — mw, are not confidence 
limits, although they are fiducial limits. It will, in fact, be evident enough from (20.31) 
that if we determine ¢, and ¢, so that the integral of ‘“‘ Student’s”’ form between those 
limits is «, then the corresponding limits for ¢, say ¢) and ¢,, are dependent on the variance 
ratio 6 = oj/o3. This is fairly evident on general grounds, and the point has been put 
beyond doubt by both Fisher (1937b) and Neyman (1941a), who have worked out particular 
cases of difference. 

The fiducial distribution of « (which is an extension by Fisher of a result given by 
Behrens as early as 1929) thus provides a crucial point of difference between the theory of 
fiducial inference and that of confidence intervals. 


20.17. In conclusion, we ‘will indicate the viewpoint of Jeffreys towards the type of 
problem dealt with by “ Student’s”’ distribution for limits to the mean and Behrens’ 
distribution for limits to the difference of two means. 

If H denotes the general data, we have for the ‘“ Student” distribution— 


k dt 


£2\40+D * 
(+5) 
y 


The expression on the left states the probability that ¢ will lie in a given range dé on the 
assumption that H is true, the parent mean being mw and the parent variance o?. Since 
y and o do not appear on the right they are irrelevant and may be suppressed, and hence 


P {dt |p, 0, H}= » _« (20.82) 


k dt 
Je { dt | is = 7... ¢2\t0FD ¥0+1) . ° e . (20.33) 
4) 
y 
Suppose now that we assume that . 
P {di | #, 8, H} =f (t) dt. : ° : - (20.34) 


Then, as before, and s may be suppressed and we have 
P{dt|H}=f(@d, . ‘ . : - (20.35) 
and hence, by comparison with (20.33), 


Pin 


{2 $ (+1) ° 
(5) 
y 


We can then proceed to find limits to t, given x and s, in the usual way. Jeffreys empha- 
sises, however, that this depends on a new postulate expressed by (20.34) which, though 
natural, is not trivial. It amounts to an assumption that if we are comparing different 
distributions, samples from which give different Z’s and s’s, the scale of the distribution 
of » must be taken proportional to s and its mean displaced by the difference of sample 


. (20.36) 


means. 


94 FIDUCIAL INFERENCE 


20.18. Ina similar way it will be found that to arrive at the Behrens distribution 
it is necessary to postulate that ; 
Z { dt, dts | X1, Xa, ce an H} =F, Gis teeta 3 . (20.37) 
Jeffreys’ derivation of the Behrens’ form from Bayes’ theorem would be as follows :— 
The prior probability of du, du, do, do,|H is 


P {du du, do, do, | H} ae — doy 
PY 
The likelihood (denoting the data by D) is 
Ht n = n ae 
JENS ID) | Ma fa 01, D2, H} oC om gt exp [- oe | Ma =< fa)" ie s?} ie 23 Ue ara oy ar st} |. 


Henee, by Bayes’ theorem 
1 -_ 
P { du, dus do, do, | DH} Sa rea ees oe! soles { (uy — &,)? + s?} 
Oe On" 207 : 


~ 24 { (ua — &)* + a} | dan ducdoudon 
2 


Integrating out the values of o, and o,, we find for the posterior distribution of uw, and pz 
a form which is easily reducible to (20.29). 


20.19. To sum up: so far as concerns problems of estimation the Behrens test is 
accurate both in fiducial theory and in the theory of probability propounded by Jeffreys. 
But the test does not hold in the theory of confidence intervals. In fact the latter fails 
to provide an exact solution to the problem, though we shall see below (21.28) that approxi- 
mations are possible. Fisher has criticised confidence intervals on the ground that they 
do not give an answer to what is admittedly an important question ; but it appears possible 
to maintain consistently that some questions may not have an answer. 


NOTES AND REFERENCES 


For the general theory of fiducial inference see Fisher (1930a, 1933, 1935a, b, 1936c, 
1941a). The difficulties of reconciling Behrens’ test with confidence-interval theory were 
noticed by Bartlett (1936a) and led to some controversy, for which see Fisher (19370, 
1939a, 1940c), Bartlett (1939a), Yates (1939/), and Neyman (194la). For Jeffreys’ views 
see his papers of 19376, 1938c, 1939d and 1940. 

For the practical application of Behrens’ distribution see Sukhatme (19386) and Fisher 
(1941a). Behrens himself stated his results explicitly only for the case of equality of sample 
number, 2, = m2, the extension being given by Fisher (19355). 


EXERCISES 


20.1. If # is the mean of a sample of n values from 


eee 2a)? 
OS { “a } fe 


2 (% — <)?, and x is a further independent sample value, show that 


j . ee Nn 
s’ n+1 


pai lt 
6? is equal to >> 


EXERCISES 95 


is distributed in “ Student’s’”’ form with » =» —1. Hence show that fiducial limits 


for x are 
Gas Nie 
n 


where ¢, is chosen so that the integral of “ Student’s ” form between — ¢, and ¢, is an 
assigned probability «. 
(Fisher, 19355. This gives an estimate of the next value when m values have 
already been chosen, and extends the idea of fiducial limits from parameters 
to variates dependent on them.) 


20.2. Show similarly that if a sample of n, values gives mean Z, and estimated variance 
8,2, the fiducial distribution of mean #, and estimated variance s,? in asecond sample of n, is 


8,1 6 m2 dz, ds, =. 
Ny Ns \ H(ny+ns—1) 


| (ms — Wait + (me — 1 + — Ht [aM 


Hence, allowing n, to tend to infinity, derive the simultaneous fiducial distribution of 
pu and o. 


dF 


(Fisher, 19350.) 


CHAPTER 21 
SOME COMMON TESTS OF SIGNIFICANCE 


Tests of Significance 

21.1. We now pass from the problem of estimation to that of significance. The 
two are closely allied and in practical problems they both arise together as a rule; but 
it is useful to preserve a distinction between them. In estimation we try to find, with 
greater or less accuracy, the value of some parameter in a population which is known to 
be (or assumed to be) dependent on that parameter. In tests of significance we are given 
some value of a parameter beforehand and wish to decide whether it is acceptable in the 
light of the evidence. This is the distinction in its simplest terms, but of course the 
associated problems become increasingly complex when several parameters are concerned. 


21.2. From one point of view the problem of significance is logically anterior to that 
of estimation. Suppose we have records of the yields of two varieties of wheat grown 
under similar conditions, and are interested in a comparison of the average yields of the 
two. Our first question is whether the observed mean yields indicate any difference between 
the varieties—a matter of significance. Not until significant differences are established 
does our interest turn to the magnitude of the difference—a matter of estimation. Again, 
if we have a set of records of only one variety, our primary problem may be to decide 
whether they are consonant with the hypothesis of normality in the parent population, 
whatever its mean and variance ; and only when this point has been settled affirmatively 
do we proceed to estimate those parameters. 

Nevertheless, we have lost very little by taking the problem of estimation first. In 
some practical problems the question of significance is already decided, and in many others 
we use estimates of parameters to test the significance of the latter, in which case estimation 
and significance become different aspects of the same statistical fact. 


21.3. We shall consider the general theory of testing statistical hypotheses in Chapters 
26 and 27. That theory is, however, rather abstract, and we anticipate it to some extent 
in this chapter by giving an account of the principal tests in current use, without for the 
moment going too deeply into their rationale. It will be seen later that there are sometimes 
many significance tests which can be applied to the same problem, and that it is possible 
to lay down criteria for deciding which, if any, are the “ best’. This aspect of the subject 
will not concern us for the present. We shall not discuss whether the tests we describe 
are the best possible (though some of them, in fact, are so) but shall merely present them 
as useful and convenient, albeit perhaps not unique, solutions of our problems. 


21.4. Developments in statistical theory in the last two decades have resulted in 
a great many tests of significance appropriate to special problems. It is not easy to classify 
them and quite impossible to deal extensively with them all. We shall consider them 
under the following heads :— 

(a) Tests of the significance of a specified parameter value.—The typical hypothesis 
here is that a parameter in a population of known form has a specified value (usually 
zero). We wish to know whether the evidence provided by the sample supports the 
hypothesis or not. 

96 


STANDARD ERRORS 97 


(6) Tests of goodness of fit—_The hypothesis is that the population is of a certain 
kind which is either fully specified beforehand or can be “ estimated ” with the help 
of the data. We wish to know whether the sample values fit this population in the 
sense that they could have arisen from it by random sampling to any acceptable degree 
of probability. This hypothesis is more general than that of (a) since it concerns 
the whole distribution function and not merely one of its parameters. 

(c) Tests of homogeneity —The hypothesis here concerns two or more populations, 
each providing a contribution to the sample. We wish to test whether the populations 
have certain parameters in common, or in the extreme case, whether they are identical. 
This case can be regarded as an elaboration of (a) where several parameters are simul- 
taneously tested. In the particular case when only two populations are concerned 
we may sometimes reduce it directly to type (a) by considering differences; e.g. if 
we are making a comparison of parent means the hypothesis might be that the single 
difference of means is zero. 

In addition we shall also consider two sets of tests of rather a different kind :— 

(d) Tests of order of occurrence.—The hypothesis here is that the sample members 
occurred in random order, and we wish to ascertain whether the observed order indicates 
any systematic effects, as, for instance, whether there are any cyclical effects in time- 
series. The test here is of the sampling process rather than of parameters of the 
parent population. 

(e) Conditional tests —The hypothesis may be any one of the above types, but 
we restrict the inference to a sub-population for which certain qualities are deter- 
mined by the observed sample values. For instance, we may use the distribution 
of the sample variance s? for which the mean Z is equal to the observed value. In 
short the variation of sample values is conditioned. Type (d) may from some points 
of view be regarded as a particular case of this type. 


It is not intended to convey that the above five categories are mutually exclusive. 
A test of type (a) may, for example, be conditional or non-conditional. The classification 
will, however, provide some sort of articulation for a rather long chapter and serve to 
explain our sequence of treatment. 


Standard Errors 


21.5. For large samples the test of significance of a parameter can usually be carried 
out by standard errors. We find an estimator ¢ of the parameter 6 and consider whether 
the given value of 6 falls in the range t, + k¥/ var t, where ¢, is the value of t for the observed 
sample and & is a constant chosen at will according to a probability «. Ifso we may accept 
the value of 6, at least so far as this test is concerned ; if not, we reject it. 

If the variance of t does not depend on unknown quantities such as other parameters, 
this type of inference is justifiable as an application of the theory of confidence intervals. 
In accepting 6 when it falls in the range t, + k+/var t, we shall be right in proportion « of 
the cases in the long run. As a refinement we may, of course, use non-central intervals 
and locate 6 in an asymmetrical range t; — ko/vart to t, + k,/vart. The test of signifi- 
cance is equivalent to the estimation of the true value of 6; and it will clearly be better 
if the range of estimation is narrower, for then we reject more wrong values of 0. 


21.6. If the variance of the estimator ¢ depends on unknown parameters 0, ... 4, 
we can usually substitute estimates of those parameters obtained from the sample itself, 
A.S.—VOL. I. H 


98. COMMON TESTS OF SIGNIFICANCE 


provided that the sample is large. For example, we have for normal samples 
2o 
=) = 0-97725. 


The sample standard deviation s will differ from o by a quantity of order 1/4/n, so that 
to that order 


P (u oe dt 


S 2s 
PAu <a+ ah = 0-97725. 
The approximation breaks down for small samples, and more accurate methods are required. 


21.7. The use of standard errors in testing significance has been illustrated in previous 
chapters, and we need not enlarge on the process further. We may, however, remark 
two things :— 

(a) That if the distribution of an estimator ¢ tends to normality for large samples 
irrespective of the parent form (as, for instance, is the case with the mean and other moments 
under very general conditions), it is not necessary that the hypothesis should specify the 
parent form. In short, our test of significance is independent of the parent, a valuable 
generality which rarely obtains for small samples. 

(b) That we have justified the logic of reasoning involving the use of standard errors 
by the theory of confidence intervals (and a similar justification can be given in terms 
of fiducial intervals if we use an efficient estimator for which the loss of information tends 
to zero relative to the total information in large samples). This appears to be the most 
satisfactory basis for the use of standard errors. The usual intuitive basis advanced 
(necessarily) in introductory textbooks is not easy to defend. For instance, it is customary 
to reject a value of 6 if it gives to an observed t¢, or greater value a small probability ; and 
there is no obvious reason why we should base our inference on the improbability of greater 
values of ¢,, namely on the improbability of something which has not occurred (see 21.55 
below). Our present approach shows that in fact the use of standard errors can be justified 
logically without invoking a new principle of inference. 


Significance of the Mean in Normal Samples 


21.8. Suppose we have a sample from a parent population which is known to be 
normal, but of whose mean and variance we are ignorant. We wish to test the significance 
of a given value jz, of the mean, that is to say, we wish to consider whether the observations 
could, to any acceptable probability, have been derived from a population with mean jis, 
whatever the variance may be. 

We calculate the statistic 


w— py 
t=-—! wy, - ae -.° Jems cum 
all the quantities in which are given. We know that the distribution of ¢ is 
1 ' 
ri 
ee) g 
= : a: - oe « (one 
vies r(3) (3 +") : 
y 


and hence can find the probability that our calculated value of ¢ is attained or exceeded. 
If this is small we reject «,; if not, we accept it. What values are regarded as ‘ small ” 


SIGNIFICANCE OF THE MEAN IN NORMAL SAMPLES 99 


for this purpose is a matter of convention, but the most frequently used values are 0-05, 
0-01 and 0-001. 

From the work of the previous two chapters it will be evident that this type of infer- 
ence is the confidence- or fiducial-interval approach in a slightly different form. Given 
a we can find — ¢, and ¢t, such that the integral of dF in (21.2) between those limits is «. 
Sand & + and if 
Ho lies in this range we accept it. In particular cases we may have f¢, = t,, in which cases 
the intervals are central and our probability « is the chance of ¢t being attained or exceeded 
in absolute value; or t) = +- 00, in which case « is the chance that — ¢, will be attained 
or exceeded, and no lower limit to yw, is imposed. 


This gives us confidence or fiducial limits to » of the type « — 


Example 21.1 


The weights of fifteen bags of sugar taken from a filling machine are found to be, in 
ounces, 16:1, 15-8, 15-8, 15-9, 16-1, 16-2, 16-0, 15-9, 16-0, 15-7, 15-7, 15-8, 16-0, 16-0, 15-8. 
Each bag should be 16 ounces, but some deviation is inevitable. One of the manufac- 
turer’s problems, of course, is to keep this deviation to a minimum, but that is not the 
point we now consider. Our question is: if the machine is supposed to be giving weights 
of 16 ounces on the average, does the sample suggest that it is failing in its purpose ? 

The hypothesis is that the parent mean is 16 ounces and the deviations from this 
mean are, in order of magnitude, — 0-3 (twice), — 0-2 (four times), — 0-1 (twice), 0-0 
(four times), 0-1 (twice), 0:2 (once). The sample mean is thus — 0-08 and to that extent 
the average of the sample is slightly underweight. Is this a significant effect ? 

It will be found that s? = 0:0216 so that 

0-08 
= Tie =— 204 y= 14 
From Appendix Table 3 (vol. I, p. 440) we find that for »y = 14 the probability of a deviation 
greater in absolute magnitude than 2-04 is about 2 (1 — 0-969) = 0-062. This is small, 
but whether we regard it as significant or not depends on the probabilities we are prepared 
to consider as defining significance. The usual values are 0:05 and 0-01, and with such 
criteria we should not take the observed value as significant, though it arouses suspicions. 

We have here used central intervals, which are usual for the t-test of significance 
of the mean; but it is easy to imagine circumstances in this particular case for which 
non-central intervals might be required. For instance, if the machine was at fault and 
had a true mean filling weight of more than 16 ounces the manufacturer would be giving 
sugar away for nothing. This might be serious, but probably not so serious as if the 
machine was erring in the other direction, which would render him liable to prosecution 
for selling short weight. Suppose he assessed the latter risk as nine times as serious as 
the former and was working to a probability level of 0-05. Then he would require 
the probability of a negative value of ¢ greater than the significance value to be 
0-955 (= 1 — 0-045) but could allow that of a positive value less than the significance value 
to be 0-995 (= 1 — 0-005). From Appendix Table 3 we see that this corresponds to 
deviations of approximately — 1-8 and -+ 3-0. Our observed value is outside this range 
and is thus significant. Small as the average shortage is, it would be prudent to overhaul 
the machine and to make sure that it is giving fair weight on the average. 

We may note further that if the sample had occurred in the order 


15-7, 15-7, 15-8, 15-8, 15:8, 15-8, 15-9, 15-9, 16-0, 16-0, 16-0, 16-0, 16-1, 16-1, 16-2 


‘= 


100 COMMON TESTS OF SIGNIFICANCE 


we should almost certainly have concluded that there was something wrong with the 
machine, for the weights are steadily rising. The t-test would give the same result for 
this sample as for the first, since it does not depend on the order of occurrence of the mem- 
bers. Where, therefore, the appearance of individual sample members is ordered in time, 
the t-test alone may fail to reveal significant effects due to the changing of the population 
between drawings. Our data are still such as could have arisen at a single drawing of 
fifteen members from a population with mean equal to 16 ounces; but the data throw 
doubt on the point whether we are really asking the right question in assuming that they 
all came from the same population. We consider the point again below (21.41). 

Before leaving this example, we may note another possible test, cruder than the t-test 
but sometimes useful. If the parent mean were really zero, positive and negative devia- 
tions should occur equally frequently in the long run. In our present case there are 8 
negative deviations, 3 positive ones and 4 zero. If we allot, conventionally, two of the 
last to each group we have 10 negative and 5 positive deviations. The expected number 
is 74, so that the deviation is 24, with a standard error of 4/(15 « 4 x 4) = 1-94. The 
observed deviation is very little in excess of this, so we conclude that the preponderance 
of negative signs in the sample is not significant of a negative mean in the population. 
More exactly, we find that the occurrence of 5 or fewer positive deviations is the sum of 
the first six terms in the binomial (} + 4), namely 0-151, leading to the same conclusion. 
The test is a very rough one since it pays no attention to the magnitude of the deviations ; 
but it has the advantage of applying to any symmetrical form of parent population for 
finite samples. 


Properties of the t-Distribution 
21.9. “Student’s ” distribution has numerous applications in the testing of signifi- 
cance apart from the one just considered, and we proceed to study its properties. 


The form (21.2) is a Pearson Type VII and may be transformed to the Beta-distribution 
2 
(Type I) by the substitution € = 1/{ 1+ =}. The distribution function of ¢ may thus 


be obtained direct from the B-function. For instance, we have 


whence i 
9 
2F l= ] | - +1 a 
Us 0 YP 
oral pee 

= 1 3-1 

= pel Bg (1 ot tds 
(3) 


whence F (21.3) 


Il 
| 
neoh 
— 
ure 
Ze 
bole 
bol = 
eee 
e 
® 
fA 
e 
e 
. ] 


PROPERTIES OF THE t-DISTRIBUTION 101 


The values of the argument for which J has the values 0-50, 0-25, 0-10, 0-05, 0-025, 0-01, 
0-005 and » = 1 (1) 30, 40, 60, 120, oo, have been tabled to five significant figures by C. M. 
Thompson and others (1941a) and can hence be used to derive the values of ¢ corresponding 
to those probability levels. 


21.10. Except for special purposes, however, the use of the B-function is unnecessary, 
since the distribution function of ¢ itself and tables based thereon are available. 
We have 


2 2 4 Wy 
—log (1+ 5)=-S4 0 -...40 4... 


y 2y? jy 
and hence 
y+ t? j(- #14 G+) (— 2) 
— ] 1+-—-)=-—#?F4+... : o + 5 alee 
Et iog (1 +5) =—ge4... 4S SP EU + (21.4) 
Further, from the expansion for log (1 + x) we find 
y+1 
ic) 
2 2 1 1 i 
log I — Sar ane Ver Sree see ° ° (21.5) 


Now as » tends to infinity, ¢ tends to the normal form with zero mean and unit variance. 
Writing 
] 


es DANE 
Vv (22) 
we find for the logarithm of the ordinate of (21.2), in descending powers of », 
1 1 1 
— — 2#7 — 1) — 2t8 — 3t* ——, (38 — 416 + ] 
log y + — (t* — 2 — 1) — 55 ( ) + 5, (38 7) 
1 1 
— —— (4/19 — 58 -— (5/12 — 67° — 3)—... . . er 
aie geri ) (21.6) 


Taking the exponential and integrating from ¢ to oo, we find 
] 


1 1 
La 4 (1 af — Tit — Bt? — 3) ¢ fo 1148 
Wa TD) + ogy | )t + sepa | 
1 
ai + Gf — 342 — 15)¢ + ——_ (15#14 — 375012 + 222540 — 21448 
Recs lala t+ o5160n8 UP! 4g 


— 9390 — 213¢4 — 915t2 + 945)t4+.. | . . ° el) 


This is the expression, due to Fisher, which was used by “ Student ”’ himself in calculating 
the distribution function of t given in Appendix Table 3, Vol. 1. For values of v > 18 the 
first four terms of (21.7) give # to an accuracy of about 0-000,005. 


21.11. Tables are also available in the “ inverse ” form, that is to say, giving values 
of ¢ corresponding to specified values of y and F. Such tables may be derived by inter- 
polation from the ‘“ Student ” tables or by the normalisation method of 6.32. In work 
involving tests of significance this type ot table is perhaps the most convenient, since it 


102 COMMON TESTS OF SIGNIFICANCE 


enables one to decide without calculation (other than interpolation for values of the 
argument not covered by the tables) whether particular values are significant for chosen 
probability «. The complement of the probability « is spoken of as a level of significance 
and expressed either as a number between 0 and 1 or as a percentage. Similarly the 
corresponding values of t are called significance points, and we may speak, for example, 
of the 5 per cent. value of f, meaning that value for which F is 0-95. 

Fisher and Yates (1938a) give the values of t for » = 1 (1) 30, 40, 60, 120 and o and 
2 (1 — F) = 0-9 (0-1) 0-1, 0-05, 0-02, 0-01, 0-001. These tables, it should be remembered, 
give the significance points corresponding to twice 1 — F, that is to say the values of ¢ 
such that the proportion of the distribution outside the range + ¢ is 1 — F. 


21.12. The number » is usually called the number of degrees of freedom of t. This 
is an expression which occurs in other connections, and a few words of explanation are 
desirable. 

It has been seen that the variance of a normal sample is distributed like the sum of 
(n — 1) squares of independent variates (compare Example 10.5, vol. I, p. 238) and gener- 
ally, that if there are x linear relations connecting the original variates, the sum of squares 
of the originals is distributed as the sum of n — k independent normal variates of equal 
variance. Each linear relation reduces the freedom of the variation, as it were, by unity. 
It is thus natural to speak of the number of degrees of freedom, y, of a function such as 
y?, meaning thereby that it is distributed as the sum of squares of » independent 
normal variates with equal variance. The expression only has this natural meaning when 
normal variation is concerned. 

It so happens that the quantity ¢ depends on a parameter » which is convenient for 
tabulating its distribution function and is also the number of degrees of freedom of the 
statistic s? entering into the denominator of t. » may thus, by an extension of the term, 
be called the number of degrees of freedom of ¢, but this usage does not imply that ¢ is 
distributed as the sum of squares of normal variates. 


Distribution of t in Non-normal Case 


21.13. Part of the price we have to pay for the precision of the t-test in small samples 
is the assumption of normality in the parent. If the population is not normal we may still, 
of course, consider the distribution of “ Student’s ” ratio, which will remain independent 
of the scale parameter ; but complications appear because the parameters which express 
the deviation from normality will, in general, appear in the sampling distribution. Further- 
more, the distributions of @ and s are no longer independent. 

Let us in the first instance prove the last assertion which is due to Geary (19368), 
in the form: If the mean and variance in samples from a population are independent 
and the population has finite cumulants, it must be normal. 

From 11.13 we have 


« (217) = — r>0. 


If mean and variance are independent, « (21") = 0 and hence x,,., = 0 forr>0. Thus 
the population must be normal. It is rather remarkable that we have not had to use 
relations of the type « (2° 17) = 0, s > 1 in arriving at this result and that we need only 
assume independence for one size of sample. 


DISTRIBUTION OF ¢ IN NON-NORMAL CASE 103 


21.14. In the notation of Chapter 11 we write 
rye xv kivV/v 


= te 
8 via(1 +S “) 


Ke 


and expand in terms of powers of ieee cocthed follage that of 11.23 auduae 
Ke 


find for the moments of ¢ about the parent mean, assumed zero, to order »~? 


1 3 
= {i ae a (2A, — 2A, + siana | 


My 


Vv 
2 nn, 2 : 
be = 1 ae + A?) tas — Ag od OAs aE + 6A2 A,) 
; Le. ie I - (21.8 
= 5 eo igemea + 1472) = (102 — 304, + 242, 
14 
+ 12072 + 44, — 132A, 4s — 622 + 168/2 A, + 12048) 
where fea 
Kat? 


If the parent form is symmetrical, cumulants of odd order vanish and we have, to 
order y~2 and first order terms in the /’s— 


Hy = fs = 0 
2 6 2A, y—l1 2A, 
=1l1+-+=-—--—... = ——_ — —— 
- ie v a pes: y¥—3 7 (21.9) 
18 , 102 24, 304A, easy — 1)? 2A, 304A, 
iis gy, cigs v ya be (y —3)(—5) y2 


Except for the term in A, these are the values of the moments of ¢ in ‘“‘ Student’s ”’ dis- 
tribution, and it follows that for symmetrical parents which are not excessively lepto- 
or platykurtic we should not expect the ¢-test to be invalidated. If the parent is skew 
the situation may be different. 


21.15. The general skew case has been considered by E. S. Pearson and Adyanthaya 
(1928, 1929) from the experimental viewpoint and by Bartlett (1935a) and Geary (1936) 
from the theoretical viewpoint. Various writers have derived exact distributions of ¢ 
in non-normal samples, but the sample numbers are, as a rule, trivially small and the 
results of little practical value. Geary considers the population expressed by the first 
two terms of the Gram-Charlier series— 

__! ene — 92) | e-i* 
dF = =a 3 (3% — x )} e€ dx . : - (21.10) 
and assumes that powers of x; above the first may be neglected. He finds (cf. Exercise 
21.1) that the frequency function of ¢ in this population is equal to the “ Student ” form 
plus a corrective factor 


ks ih i. t dt 
& Viet D “( oot 


> . Gla 


v 


104 COMMON TESTS OF SIGNIFICANCE 


The integral of this factor from — o to —t is 


K VM £2\—40+ 2) oy teu 
Ks __ See i | 
WiGers ll i -) ( sae e), : 22112) 


giving the correction to be applied. (Geary gives a table for some representative values.) 
This, of course, depends on «3, but even where exact knowledge of the skewness is not 
available we may sometimes safeguard against error by considering the correction for 
plausible values of ‘ks. 


Other Uses of the t-distribution 


21.16. The usefulness of ‘‘ Student’s ” ¢ derives from the fact that it is independent 
of the scale parameter, and the simplicity of its distribution from the fact that it is the 
ratio of two independent variates, the numerator distributed normally and the denominator 
distributed in the Type III form. We shall see below (21.26) that these properties can 
be used to test the difference of two means in normal populations with equal variance, 
and in Chapter 22 we shall encounter a test of regression coefficients which is based on 
the same properties. 

We have also noted that ‘‘ Student’s ” form can be used to test the significance of the 
product-moment correlation (14.15) and the Spearman rank correlation p (16.18). These 
facts are, however, in a sense accidental. They do not derive from the expression of the 
parameters concerned as the ratio of a normal to a Type III variate, but from the simpler 
fact that the distributions are of the Type II form (symmetrical with finite range) and 
hence can be transformed to the “ Student” distribution, which is of Type VII. Sym- 
metrical distributions of finite range can often be represented very approximately by a 
transformation to the “Student ”’ form, especially if they tend to normality. 


Test of a Variance in Normal Samples 
21.17. The distribution of the sample variance s? in normal samples is 
dn)t (1) g2\t (n—3) ns2 g2 
jap UE 2 ase == 0. . (2b 
- ea (5) exp ( sa) (5) 0<s<o (21.13) 
2 

Thus, given for consideration a value of o? and an observed s*, we can find the probability 
that s?/o* is attained or exceeded and accept or reject o? in the usual way. The distri- 


bution function of (21.13) may be expressed as an incomplete J-function, or more con- 
veniently for statistical purposes in terms of y? (= ns?/o?) with »y =n — 1. 


Example 21.2 


In Example 21.1 we found s? = 0:0216, y = 14. Could the data have arisen by chance 
from a population in which the true variance is 0-01 2 


ns® 
We have 7? = se = 824, y= 14. From the diagram on p. 446 of vol. I we see 


that the probability of such a value or greater is between 0-01 and 0-001, a very improbable 
result ; and hence we reject o? = 0-01 as a value of the parent variance. 

Once again this type of inference can be justified by the theory of confidence intervals 
since the probability 


2 
P 1a > saa | < 0-01 


TEST OF A VARIANCE IN NORMAL SAMPLES 105 


is equivalent to 
ie {o < aif a0. 
32-4 
In asserting that o? was less than ns?/32-4 (in our present case 0-01) we should be wrong 
more than 99 times in 100 on the average. 

There is a point of interest to note here. In Example 21.1 we considered a hypothesis 
as to the mean yw, and in the present example a hypothesis as to the variance o?. Had we 
considered the two together, that is to say the compound hypothesis that u = 16 and 
o* = 0-01, we should have been in difficulties in justifying our procedure by reference to 
confidence or fiducial intervals, since we could no longer assert that our conclusions were 
right in an assigned proportion of cases. We have avoided this complication by con- 
sidering separately the hypotheses (a) that ~ = 16 whatever the variance, and (b) that 


o? = 0-01 whatever the mean. This resource is not as a rule open to us where non-normal 
variation is concerned. 


Tests of Normality 


21.18. In large samples we can group the data into ranges and compare the actual 
frequencies with those to be expected on the hypothesis of parent normality. This com- 
parison over the course of the frequency function is not satisfactory for small samples 
unless the grouping is so broad as to deprive the test of most of its efficacy. An alter- 
native is to compute some statistic of the sample and to examine how far it departs from 
the mean value to be expected on the hypothesis of parent normality. 

Consider, for instance, the statistic 

=e 5 ‘ : , » (21.14) 
ky 
This is independent of the mean (because the k-statistics are so) and is also independent 
of the scale parameter because it is ‘“‘ studentised”’. In normal samples, therefore, the 
distribution of ¢ is independent of mean and variance and thus depends only on the sample 
number n. We have already given formulae for its mean and variance (Exercise 11.16, 
vol. I, p. 289). In fact, 


Hy (t) = Ms (t) = 0 

6n (n — 1) : : . (21.16) 
(n — 2) (n + 1) (n + 3) 
Since the distribution of t is symmetrical we may, for moderate n, consider it as normally 
distributed with zero mean and variance given by (21.15), and this will provide a test— 
of a somewhat approximate kind—of normality in the parent from which the sample is 
derived. 


He (t) = 


Example 21.3 


In the data of Examples 21.1 and 21.2 we have, for the sample moments about origin 
16, in units of 0-1 
m, = — 08 
ie — 2G 
ms, = 0-496 


106 COMMON TESTS OF SIGNIFICANCE 


n 
whence k= pe Ms, = 2:31429 
k | = 0-61319 
— tt - 
> (wn — 1) (n — 2) * 
and t= =~ = 0-174. 
k,? 


The variance of ¢, from (21.15), is 0-3188 and its standard error accordingly about 
0-57. The observed deviation from zero is considerably less than this, and we see no reason 
to doubt the hypothesis of normality so far as this test is concerned. 


21.19. Another test of normality has been proposed by Geary (1935a), namely 
the use of the ratio 
mean deviation 


_ a, ; . (21.16) 
standard deviation 


If the parent mean is zero, the parent value of w is A 2 _ 0.79788. The test has also 
rt 


been adapted to the case when the parent mean is not zero, and tables provided for the 
application of the test (Geary and Pearson, 1938). 

Geary’s ratio is directed towards detecting deviations from mesokurtosis in the parent 
The criterion based on k,/k?, which is a natural extension of that for skewness based on 
k;/k,?, is not very suitable for the purpose, since it has a skew distribution for quite high 
values of n. The distribution of Geary’s ratio tends to normality fairly rapidly 
(cf. Exercise 21.2). 


Tests of Goodness of Fit 


21.20. In Chapter 12 we considered in some detail the use of y? in testing corre- 
spondence between observation and hypothesis. If the hypothesis specifies the theoretical 
values completely no question of estimation arises, and each cell contributing to y? could, 
if so desired, be tested separately. From this point of view y? compounds into a single 
test a number of tests of the kind already considered. 

If the hypothesis does not specify the theoretical values completely, but leaves them 
to be estimated in part from the data, some modification in the y?-test is necessary. We 
can now establish a result which in 12.13 was announced without proof: if the estimators 
employed are maximum likelihood estimators, then for large samples the y2-test of signifi- 
cance retains its validity, provided that the number of degrees of freedom is reduced by 
unity for every parameter estimated. 

Suppose the hypothesis leaves unspecified a parameter 6, and let ¢ be its maximum 
likelihood estimator. Then if the theoretical frequencies based on the true value of 0 
are A and those based on ¢ are 4’, we may write 


(as 
5 ; 
(d — 2’)? 
ry . ° ree ° ° 


Gad =» QlI7 


weak . (21.18) 


TESTS OF GOODNESS OF FIT 107 


x’ is distributed as the sum of squares of y normal variates with unit variance. The problem 
is to find the distribution of y’*. We have 


1 1 
: 2 2 -_-_—_—- — 
f= y a1 € 7) 


and for large samples the difference between A and 2’ will be of order n~?. We then have, 
expanding the difference in terms of 66, to order n~}, 


oi 2 /au'\* 1 824") (66)? , 
oa aa a 36 + 19 ( 55) Tat pet: i 


Now for large samples the maximisation of the likelihood is equivalent to minimising ,?, 


and hence 
12 aa’ 
= (ix) = ° 


ra (80)? Sf 2/aar\2? ara” 
ers ee ae ead 
a 2 21 2(45) sai 
1 / a2" \2 
= 2 a= = 
— (36) zEiz( 3 \ a eee) 


But the sum on the right is the reciprocal of the variance of the maximum likelihood esti- 
mator, and writing dét for 60, as is legitimate for large samples, we have 


; (dt)? 
Py a of : 
x x var t 


and 


. (21.21) 


The quantity on the right is itself the square of a variate which (in the limit) is normal 
and has unit variance. Furthermore, its distribution is independent of that of y’2. For 
consider the spherically symmetric density-distribution of the » normal variables whose 
sum of squares composes y*. Let O be the origin and P any point; then y? = OP?. Now 
for large samples the variation takes place in the neighbourhood of O. A surface -of con- 
stant ¢ through P is approximately plane in the effective range of variation. If OQ is the 
normal to this surface, 
OP* = 09? + PQ?, 
corresponding to 
(dt)? 


= "2 
var t THs 


ve 
for t is chosen so as to minimise y’? = PQ?. Thusif we take ¢ as a new co-ordinate, together 
with (v — 1) others in the surface of constant t, the axis of ¢ is orthogonal to the space of 
constant ¢, and ¢ will be independent of y’?. 

It follows further that y’? is distributed as the sum of (vy — 1) squares of normal 
variates. Thus the usual Type III distribution of y* holds for » — 1 degrees of freedom ; 
and so for every constant fitted, with a reduction of unity in the number of degrees for 
each constant. We have already exemplified the use of the result in Example 12.4 (Vol. I, 
peek) 


The w?-distribution 

21.21. For small samples the y?-test is difficult to apply, since it depends for its 
validity on the fact that the binomial distribution in individual cells may be represented 
by the normal distribution, and hence requires that cell-frequencies shall not be small. 


108 COMMON TESTS OF SIGNIFICANCE 


A test of a different kind has been proposed by Cramér (1928) and independently by von 
Mises (1931). 
Put 


ot = (F(x) —F(z)}2de, . .  .  . (21.22) 


where F (x) is the observed distribution function and F (x) the hypothetical distribution 
function. The quantity w? varies from sample to sample, its mean value being 


Ua aes 1 
£) gee l= a , . . (21.23 
E (w?) =| Fert F (a)}de = — Ay, (21.23) 
where A, is Gini’s coefficient of mean difference (cf. 2.24). For 
E(w?) ={ B( FP — Py dz. 
For any given x the expectation of (F — F)? is merely the variance of the proportion F 


and hence is equal to ——— The result (21.23) follows at once. 


The w*-test consists of comparing the observed with the mean value; but it is not 
possible to express the comparison in terms of probability as the sampling distribution 
of w? is not known. 


21.22. The numerical evaluation of the integral (21.22) is tedious in the case of a 
continuous distribution, and Wold (1938a) has suggested a modification. If the variate 
range is divided into intervals at — 00, 2, %,...2;... 0, we define 


a {F(a F (ays : 2 ; . (21,24) 
If the intervals are all of width h, 
1 
2 — — — 
(wt) = — al? (e){1—F(a)}de+—R, .  . —. (21.25) 


where #/,, is a remainder term. If this may be neglected, the w?-test is equivalent to the 
w*-test but easier to apply. IPfthe data are ungrouped, the z,’s may be taken at equidistant 
intervals. 

In the particular case when F is normal, we a 


n E(w) =f lz ee ei age au dvde. . —. (21.26) 


Putting u=«-+2 andv=8+2, we au after integration with respect to x, 


I 2 os 
f—coaed pa = 2 
xa. |, xP {— 3 (@ — B)*} da ap. 
A further substitution of y = « — B and 6=« + gives 


Pees 
—————aat Ee 
El. {, y 
or Ly — ye 
= ye dy 


= ——, . . : . ‘ t . » (21.27) 


DIFFERENCE OF TWO MEANS 109 


21.23. An interesting modification of the w?-test has been given by Smirnoff (1936) 
who defines 


w* =| (isan, ’ : : 2125) 


The difference lies in the differential element which has the effect of rendering 
the distribution of w? independent of F. It is shown that as n tends to infinity the distri- 
bution function of w? tends to the form 


= 2kn —}e*u", 
! | CE 
ics 


i pe 
% fond J (2k—1)n V/(— 2 sin 2) 


but this does not look a very promising formula for application in particular cases. 
Cramér (1928) has extended formula (21.27) to the goodness of fit of Gram-Charlier 
series and gives some examples of fitting to observed distributions. 


Difference of Two Means 


21.24. A common case occurring in practice is that of two independent samples of 
nm, and n, members from two populations which may or may not be different. We wish 
to decide whether the evidence indicates a significant difference between the parent means. 
This situation forms a kind of border-line case between the testing of a prior value of a 
parameter and the homogeneity tests which we shall consider below. It is a test of homo- 
geneity in the sense that we are to discuss the question whether two populations are equal 
in certain respects ; but we do not necessarily assume that they are identical, and in any 
case we can regard the problem as equivalent to the testing of a single parameter (the 
difference of the means) to see whether it is different from zero. 


21.25. For large samples we discussed the question in Example 9.10 (Vol. I, p. 226) 
and gave two tests. If the hypothesis is that the parent populations are identical (a true 
hypothesis of homogeneity) we may pool the samples to form a single sample and test 
whether either mean differs from the mean of the total. If, however, we wish to test the 
less general hypothesis that the parents have the same mean but not necessarily the same 
variance, we may test the difference of means by the ordimary equation expressing the 
variance of a difference in terms of the separate variances. This is not a homogeneity test 
in the strictest sense of the word, but tests of such a character may conveniently be dis- 
cussed in conjunction with the other type, both for small and for large samples. 


21.26. We now consider the corresponding problem when the samples are small 
and the parent populations are assumed to be normal. In the first place we take the 


case when the two populations have the same variance o?. 
2 2 
The sample means #, and @, are distributed normally with variances 2 and < and 
iL 2 
oe (M1 = Ha) 
o 


is distributed normally with variance 


means 4, and p,. Consequently 


1 
==s pets re and hence 
Ny Ns 


ALAS teh) a he gs (21.30) 


Ny + Ng 


110 COMMON TESTS OF SIGNIFICANCE 


is distributed normally with unit variance about zero mean. Further, if Sj and 83 are 
the sample sums of squares about the mean, the quantity 
2 
shee (S} See : ‘ : . (ial) 
oO G 
is distributed as y? with n, + n, — 2 degrees of freedom, independently of the expression 
(21.30). It follows oa 
7 a — Ha) Jf eee (Sa ies =) . (21.32) 
Ny + Ng 
is distributed like ‘‘ Student’s ” ¢ aa y =, +n, — 2 degrees of freedom. This expres- 
sion does not contain the unknown o and hence may be used to test the difference uw, — wu. 
This result is due to Fisher (1926a). 


Example 21.4 
In a class of 20 children, 10 chosen at random were given a ration of orange-juice 
each day for a certain period and the other 10 a ration of milk. Their gains in weight 
during the period were, in pounds :— 
First group: 4, 24, 34, 4, 14, 1, 34, 3, 24, 3} 
Second growp: 14, 34, 24, 3, 24, 2, 2, 24, 14, 3 
The mean increase in the first group is 2-9 pounds, and in the second 2-4 pounds. Putting 
aside other explanations, one possible factor accounting for this difference is the difference 
in treatments. But we wish to know in the first place whether this is significant. We 
assume, then, that treatment exerted no differential effect and that the samples came 
from normal populations with the same mean and variance. We find 
Me = 2-9 Ge —=-« 2°4 
2(%;—_ 2) — 94 & (%_ — £,)* = 3:9. 
Hence, from (21.32), with uw, — ws = 0, 
a oe 


100 
= 53 3 v18 fa = 1-30. 


From Appendix Table 3 (vol. I, p. 441) we see that such a value would be exceeded in 
absolute value with probability 0-21. The difference of a half-pound between the sample 
means is not significant. 

We note incidentally that the sample variances, 0-940 and 0-390, differ considerably, 
and shall see below how the significance of the difference may be tested. At the present 
stage our conclusion as to the non-significance of the difference of means is to be regarded 
with reserve, for the data themselves suggest that we have over-simplified the problem 
in assuming equal variance in the two populations. 


21.27. Apart from the question of unequal variances, the data of the previous 
example will serve to illustrate a further point of interest. Our hypothesis is that the 
children within each group may be regarded as a sample from a population with the same 
mean. Had we been dealing with a sample of, say, seedlings grown from the seed of a 
single plant, this hypothesis would not have been unreasonable ; but children differ very 
much among themselves in nutritional standard, and so forth. Our hypothesis is again 
liable to over-simplify the problem. 


DIFFERENCE OF MEANS WHEN VARIANCES ARE UNEQUAL 111 


When the statistician can direct the sampling himself, this kind of problem can be 
tackled with success by pairing. Suppose we select children in pairs of the same sex, 
each pair resembling each other as closely as possible in all the factors which might influence 
the experiment such as age, weight and nutritional standard. We allot at random one 
member to the first group and one to the second, and so for each pair. The differences 
in weights gained between members of a pair may then be regarded as samples from 
a population with zero mean, even if the pairs differ among themselves, and the set of 
differences tested in the usual way. 


Example 21.5 


Suppose that, in the previous example, the data had related to 10 pairs of children, 
thus :— 


| 
; First Grou Second Group Difference, 
ETE hein ibey hoe Meine: Geena 
| 
1 4 uu 23 
2 2 3h st 
3 31 2k 1 
4 4 3 1 
| 5 1} 2h ai 
6 1 2 —I1 
7 3} 2 i} 
8 3 2h i 
9 2} 1} 1 
10 3} 3 } 
| 
Torats | 29 24 5 
= a. aS ee et ee eel | 


For the values in the last column we find 


% — 0:5 g* = 1-25 y=9 
0:5 
=> —— 9g = 1°34. 
4/1-25 ve 


The probability of obtaining such a value or greater (absolutely) is about 0-22, and 
the observed differences are therefore not significant. This is the same conclusion that 
we reached in Example 21.3, but it would not have been surprising had the conclusions 
differed, for they relate to different questions. 


Difference of Means when Varrances are Unequal 
21.28. When population variances are not assumed equal the t-test of difference 
of means no longer applies. We can, if we choose, apply a test based on fiducial intervals, 
namely, the Behrens test, considered in the previous chapter. We put 
Ei — 
d= —_ Fe a a eee eles 
V5.2 + 833) Ce 


The fiducial limits of d for various significance levels have been tabulated by Sukhatme 


112 COMMON TESTS OF SIGNIFICANCE 


(1938b) and Fisher (1941a) for n, and n, greater than 5. If the observed d falls inside the 
range, we may accept the hypothesis that the population means are equal. 


21.29. As we have seen, an inference of this kind does not imply that we shall be 
correct in a certain proportion of the cases, and if we wish to find a test satisfying such 
a criterion a different approach is necessary. The following investigation is due to Welch 
(19385). 

Consider the distribution of w of equation (21.32) when the means are the same but 
the variances are different, i.e. 


L, — Ws 


a cere we : ‘ 2 . (21.34) 

Ny +N, —-2\N, Ns 

Put 
_ 2 OG: Vee 
#4 #,-(2+2) IG . . . : A . (21.35) 
2 2 2 2 
ripe Oi di 1 92 Xa : (+=), : - (21.36) 
(m +m, — 2) (24 3) 
Ny Ne 


where o? y? = S? and hence 4? is distributed as y? with », = n; — 1 degrees of freedom, 
and similarly for 73.’ may be regarded as a single normal variate with zero mean and 
unit variance. We have then 


ze . 
ame Oe 
Now put 
ww = ay; + by;, ; : P . (21.38) 
where, from (21.36), 
1 1] 
= ree 
Ny as ; 
1 i : ‘ - . (21.39) 
2 =t- | 
b = 02 Ny No 
M +, — 2 o7 am o 
Ny te 


w itself is not distributed in the Type III form unless o, = o., but we will find a distribution 


of that form which approximates to it by equating lower moments. The first two moments 
of w, being the sum of the separate parts, are 


My (w) = av, + br, 
fe (w) = 2 (a? vy, + 5B? ae ‘ ° - (21.40) 
The moments of 


f= a wr) e—v/29 dw 


(29) I(r) 


are i) SS 
Mia ee 
Ms — 2g? sk e Le e e ° (21.41) 


DIFFERENCE OF MEANS WHEN VARIANCES ARE UNEQUAL 113 
Identifying (21.40) and (21.41) we find— 


= a? y, + 6%», 
Cae Ys tee nw (21.42) 
— (av, + by)? 


a? y, + b?», 


With these values of g and v the distribution of w/g is approximately of the Type III form 
with » degrees of freedom and will be independent of y’. Hence, 


avy _ fon 
a.” 
g 
= 164/(G¥) $ ; 5 : » (21.438) 


is distributed approximately as “ Student’s ” ¢ with » degrees of freedom. In particular, 
if o, = o2,, a = b and we reduce to the test of 21.26. 


21.30. In general, when o, ~o, the quantities g and » depend on the ratio 
@ = 07/02. We have 
(7,6 + v2)? 
= )-— . S : 2 . (21,44 
d Vy 62 a. Va ( ) 


and may put w =ct where c = 1/,/vg, and hence 


+n) (2 +=) : 


Ne 


c= (242) e040 . ‘ : , (21.45) 


Ny 


Without a definite knowledge of 6 we cannot apply the t-test, but the advantage of putting 
the expressions in this form is that by considering particular values of 9 we are able to 
judge how far the test based on “ Student’s ” distribution is likely to be affected. 


Example 21.6 (from Welch, 1938) 
Consider the case n, = n, = 10. From (21.45) we have c = 1 and from (21.44) 


_9(@+ 1)? 

a es 
Suppose now we were to use the test of 21.26, based on the assumption that 6 = 1. We 
should find, to a probability level of 0-05, that | «| must exceed 2-101 to be significant. 
If we judge w significant for such values how far are we in error when @ is not unity ? That 
is to say, what are the true probabilities that 


P {|u| > 2-101} 


for varying values of 6, as compared with our value of 0-05 ? 

For a specified 9 the probabilities can easily be obtained from the approximate dis- 
tribution w+/(gv) of equation (21.43). They are shown graphically in Fig. 21.1. The full 
line (a) shows P for various values of 6 and n; = m. = 10. The full line (b) shows similarly 
the values for n, = 5, n, = 15. (The dotted line (c) we refer to below.) 

A.S.—VOL, II. I 


114 COMMON TESTS OF SIGNIFICANCE 


In case (a) the line 
does not deviate very 
much from the horizontal 
at P = 0-05, and we may 
conclude that the test 
based on the assumption 
of equal variance is not 0-2 
very much in error. In 
any case, if the curve Values 
falls below the lime P= of P 
0:05 we are on the safe 
side, for our true proba- 0: 
bility is then less than 
0-05, and in rejecting the 
hypothesis at that level 005 
we are adopting more 
stringent standards than 


0-3 


: 0:0 

is apparent. 0-0! 0-10: 10° /O /0C 
In case (6), when the Values of @ (logarithmic scale). 

sample numbers are un- Fra. 21.1. 


equal we have a different 
state of affairs. For 6 < 1 the test is very conservative, but for 9 > 1 it may err very 
seriously in the wrong direction. 


21.31. Welch concludes that for samples of equal size there is not a serious likeli- 
hood of error in testing the difference of means as if the parent variances were equal. For 
samples of unequal size the error may invalidate the ¢-test and an alternative criterion is 
proposed. Write 


ees ies 2 a 
ree S2 FS 
Ee 2 : : ; = (Pal ea 
{5 (n, — 1) Ne (NM, — 5} ( ) 
2 
Here, it will be observed, the denominator is an estimate of (2 ++ o y the standard 
Ny Ne : 


deviation of the difference , — #,. Precisely as for wu we approximate to the distribution 
of this denominator by a Type HI form. Corresponding to (21.39) we find 


ea 
m, (n, — 1) Ny Ne 

2 2 os 

b= 2 ((2 92 
Ne (ne Se 1) Ny + Ng 


Corresponding to (21.45) we find c = 1, and to (21.44) 


AB ieseale v8 63 1 
= ae) Vio 7) — 


vis then distributed approximately in “ Student’s ” form with » degrees of freedom. The 
dotted line (c) in Fig. 21.1 shows the relationship between 6 and P {| v| > 2-101} for 
Nm, = 5, nm. = 15. Clearly the error is now much smaller than when we used wu for the same 
sample numbers. 


.  » (2147) 


DIFFERENCE OF TWO VARIANCES IN NORMAL SAMPLES 115 


Difference of Two Variances in Normal Samples 


21.32. If we have samples of n, and n, members from normal populations with 


2) 
: 2 : é a Sy. eae 4 
variances oj and o3, the ratio of sample variances p* = — is distributed in the form (cf. 
83 


Example 10.18, vol. I, p. 249)— 


3 
a n ae as +My 2)" : : hey) 
1 2 
Cay 
The related quantity 
nm, (nr, — 1) 
Sigg te , z ; > (21.50 
oe ae GS 1)? ( ) 
is distributed in Fisher’s form 
er” dz 
dF « ee el) 


Dex . y 

Vy e~* Vo $ (",+¥2) 
+3 
( OB 


where y, = n, — 1,7, =n, — 1. The »’s may, by a convenient extension of our previous 
terminology, be called the degrees of freedom associated with z. In practice, z is generally 
used in preference to p, but tables of both are available. 

These distributions provide a test of significance of the equality of the ratio o?/o3. 
On the hypothesis of equality they are independent of the ratio and the probability of 
an observed p or z can be obtained. As usual, if this is small we reject the hypothesis. 
We leave it to the reader to show that this type of inference can be based on the theory 
of confidence intervals or the theory of fiducial intervals in the usual way. 


Example 21.7 


In Example 21.4 we had two samples of chiidren and found that the difference in 
means was not significant. This was on the hypothesis that the variances were identical, 
and since the two samples are equal in number the inference remains valid even if the 
variances are different, as illustrated in 21.31. We will now test directly whether the 
sample variances themselves indicate any significant difference in parent variances. 

We have 

» (%, — £,)* = 9-40 r= 
Z(t, —&,)?=3:90 »,=9. 
Hence 
9-40 / 3-90 
Z= 4 log. —- [ee = 0-4398. 
From Appendix Tables 4 and 5 of Vol. I (pp. 442-3) we see that for », = 9 the 5-per-cent 
points of z are 
7 — 0,08 0 ood 
vy, = 12, 0-5613 
and the 1-per-cent. points are 
v, = 8, 08494 
yy — 12, O:S157. 
Thus, notwithstanding that one variance is about 24 times the other, the probability that 


the observed z will be exceeded on random sampling from populations with the same 
variance is greater than 0-05, and the difference of sample variances is not significant. 


116 COMMON TESTS OF SIGNIFICANCE 


There is a point here which is frequently overlooked. In carrying out the z-test we 
always take the ratio of the larger variance to the smaller, so that our probability levels 
relate, not to the chance that a given pair of variances have a larger ratio than the observed 
one, but to the chance that the bigger of the two exceeds the smaller in a certain ratio. 
A probability of 0-05 thus relates to the chance that either s/s} exceeds a given amount 
k, or s?/s% falls short of a given amount 1/k. If we are interested only in the former 
contingency our probabilities should be halved. 


Properties of Fisher's Distribution 


21.33. The z-distribution plays a very important part in statistical inference based 
on small samples, and we digress at this point to give an account of its main features. 

The distribution function of z may be obtained from the incomplete B-function, for 
z may be easily transformed into a Type I variate. There are, however, special tables 
for lower values of », and », and satisfactory approximations of various kinds for higher 
values. 

The characteristic function of z is proportional to 


j eg elbtnle dz 


ate (4 e22 + v2)t (¥1+72) 
where 96 = z#, and is thus 


Vo = 
t) =[( — re f 
> (t) (=) r(%) r(%) ae : » (21.52) 
Thus, taking logarithms and using the expansion 


log P' (1 +) = 4 log 2n + (w+ })logz2—x+—o-..,, 


Base 


log 6) = —3(-—->) +50 a == -. oie 


we find 


Vy Vo 4 Va Vo 
Thus, for large », and »,, z is distributed normally with mean 


1 1 
-4(>. -=) and variance } & +>) 


Vy Vo Vy Vo 


21.34. Various approximations have been given for the case when », and », are 
not large enough to justify the assumption of normality. 

(a) (Cornish and Fisher, 1937). ‘The method is that of 6.32 and depends on the 
expansion of the distribution in a Gram-Charlier series. From the successive derivatives 
of log I’ (1 + x) we can find those of ¢ (¢), and hence ascertain the cumulants of z. Writing 

1 I 
r, = — and r, = —, we find 
Vy Vo 
1 = § (Pn Pa) ea et) 


ke =H(ri tre) +h (ri +73) +3 (73 + 73) 


2 


ee ae 3 3 
hey = (7, — 73) — (77 — 19) 
ky = as Gps) 


Ke, = 12 (ri + 19) 


PROPERTIES OF FISHER’S DISTRIBUTION 117 


Hence, putting o =r, +r, and 6=r,—~r,, we find for the l’s of 6.32 (m = 0, 
variance = }¢)— 


b= 4(o+2) +4 (or + 39%, 


and soon. After some reduction we find, for the value of z corresponding to a probability 
a (which in turn corresponds to a normal deviate £),— 


= 2 
z =2,/%— y+) - Jefze + 3é) + (ES + ug) | 


do 63 
— F5p (Et + 96% + 8) + a (884 + 784 + 16) + iat 


1929 © + 208% + I85) 
é4 


5 £3 a 
+ soa ( + 4489 + 183) + 


550 (9° — 28463 — isise) | . (21.55) 


(6) (Fisher, extended by Cochran, 1940a). Writing » indifferently for 7, and »,, we 
have, from (21.55), to order n~2— 


=o werta+ /ef% e+ 3£) ) + a5? + 118) |. 


Pat 2 =2/co. Then 
i 


Race b. Se oom 6 or te 
é é é a 
Now Vika) a Wh + ——-_ ia +0 (n 2) 
Hence, if we put 
e= a kA i 


the difference of this quantity from (21.56) is 


(€8 + 11é) 6°Vh 
144 ? 


provided that we take 2 = = 


The difference is small in virtue of the large denominator and the factor 6? = (7 ->) 
Vy Vo 
which is small if », and v, are not too different. Thus we may take z as approximately 
given by (21.57). The values of 4 for various values of the significance level are 


Level 40% 30% 20% 10% 5% 1% 01% 
A 0-51 055 062 O77 0:95 1:40 2-09 


118 COMMON TESTS OF SIGNIFICANCE 


For the commoner levels of significance the form taken by (21,57) is 


20 per cent. level : Rh —0:45146 , an 6333) 
5 per cent. level: sad — 0-78436 : : . (21.59) 
1 per cent. level : RD — 1-2356. ‘ . (21.60) 

0-1 per cent. level : she — 1-9256. : ; » (2161) 


The accuracy of the approximation for », = 24, v, = 60 may be judged from the following 
comparison :— 


Level Value of z from a 
per cent. (21.57). Exact Value. 
20 0-1337 0-1338 
1 0-3748 0:3746 
0-1 0-4966 0-4955 


(c) (Paulson, 1942). The Wilson—Hilferty approximation to y? of 12.7 indicates that 


2\3 eked 2. : 2 ee 
@ is distributed normally about mean 1 — ay with variance >" The ratio — itself 


83 
is the ratio of two independent quantities distributed as 7? with », and », degrees of free- 
dom. Further, in virtue of Geary’s theorem (Vol. I, p. 253) the ratio lig See Ce 


(of + 03 p?)t 
normally distributed in standard measure. 
We may thus regard 


ee 2a, 2 ; ° : . ~ (21.62) 
97. \ So 9, 


as approximately normally distributed in standard measure. The approximation seems 
remarkably good. For instance, the following shows the exact and approximate values 
OL 2 tor 7, — "6, vp eee. 


Level (2) = p*, from 
er cent 83 Exact Value. 
“i (21.62). 
20 1-72 1-72 
5 3-00 3-00 
1 4-85 4-89 
0-1 8-58 8:38 
= : | 


THE PROBLEM OF k SAMPLES 119 


The Problem of k Samples 


21.35. We now proceed to consider the case when we have samples from &k different 
populations and wish to determine whether there is any evidence of significant differences 
between those populations. In some cases the appropriate test can be carried out by the 
x*-distribution, particularly if the data are grouped. For the groups may then be regarded 
as determining the rows of a contingency table and the different samples the columns, and 
a homogeneity test applied to the table in the manner of Chapter 12. Again, we may 
compare the samples pair by pair by the foregoing methods; but this, apart from being 
tedious, does not give us what we want, namely a test of homogeneity of the set of samples 
taken together. 


21.36. Consider in the first instance the sampling of attributes. Suppose we have 
samples from populations in which the true proportions of successes are a, the observed 
proportions being p, . . . p, and the sample numbers 7, . . . n,, totalling n. 

If p is the mean proportion of successes in all samples taken together, and our hypothesis 
is that the populations have a common value, p will be an estimate of w and we have for 
the variance of p;— 


WY 
var p; = - 
n. 


7 
=i approximately, : : A - (21.63) 
j 


where p= - 2 0; Dj: 


It follows that (p; — p) ie re wili be distributed normally about zero mean with unit 


variance, and hence 


2 _ = {2 (pj — p)*} 


ee er ek! 
= (21.64) 


x 


in the Type III form with & — 1 degrees of freedom (not k because we have lost a degree 
by estimating p). Hence the ratio 


2 __ Ey (py — p)? 
ar ear ee ee 


has expectation unity. The quantity @ is called the Lexis ratio, after the author who 
first discussed it in detail (Lexis, 1903).* 


* Lexis first developed the use of Q in a paper “ Uber die Theorie der Stabilitat statistischer Reihen,” 
1879, Conrad’s Jahrbiicher, 32, 60, reproduced in the reference given above. He dealt, however, only 
with the case when ali the n’s were equal and had no knowledge of the sampling distribution of Q. In 
practical applications he took as each n, the average for the group. “ Den dadurch begangenen Fehler 
kann man beurteilen wenn man 7 einmal mit der gréssten und einmal mit der kleinsten Grundzahl 


berechnet.”’ 


120 COMMON TESTS OF SIGNIFICANCE 


Example 21.8 
From 1910 to 1919 the numbers of live male and female Wena in England and Wales 


were as follows :— 


‘ : . Proportion 
Year. | Male Births. Female Births. Total Births. Male /Total. 
1910 457,266 439,696 896,962 0-5098 
1911 448,933 432,205 881,138 0:5095 
1912 445,004 427,733 872,737 0:5099 
1913 449,159 432,731 881,890 0-5093 
1914 447,184 431,912 879,096 0-5087 
1915 415,205 399,409 814,614 0:5097 
1916 402,137 383,383 785,520 0-5119 
1917 341,361 326,985 668,346 0-5108 
1918 339,112 323,549 662,661 0-5117 
1919 356,241 336,197 692,438 00-5145 


ToTALs 4,101,602 3,953,800 | 8,035,402 0-5104 


The proportion of male births showed an increase during the war years 1916-1919. 
This is a well-known effect of war, but suppose we had noticed it here for the first time. 
The natural question is: can the effect be accidental ? There is no doubt about its reality, 
for the data cover the whole population ; but if we suppose that sex at birth is distributed 
according to the laws of chance, do the differences observed suggest that in the ten years 
concerned there was a significant change in the population (as regards proportion of male 
births) ? Let us consider the homogeneity test applied to the 10 proportions. 

We have p = 0-5104, n = 8,035,402, k -1 =» = 9 and the sum 2, (p; — p)* will 
be found to be 19-895,783. Hence 


Q = J es = 2-974 


9 x 05104 x 0-4896 
x? = (k — 1) Q? = 79618. 
Q is sufficiently far from ae to reject decisively the hypothesis that the data are homo- 
geneous. A y?-test will confirm the conclusion. We infer that, whatever the reason, 
the differences in proportions of male births, slight as they are, cannot be accounted for 
on the supposition that the distribution of sex is according to chance in samples from 
a constant population. We may observe that, had we obtained the same proportions 
for a sample one-tenth the size, 7? would have been 7-962 and we should not have inferred 
non-homogeneity. 
21.37. Asimilar test may be applied with k samples of variables. Let the samples be 
X41, Viz, +» + Xyy, With mean Z, 
Ue1, Loe 2 © © Van, 99 Pe | 


Vey» Tyas > + + Cen, ” Lye 
The variance of the jth sample is 


1 nN, 
oe oo (xq — %;)*, 
el Ts 


THE PROBLEM OF & SAMPLES 121 


and an estimate of the population variance may be obtained by taking the weighted mean 
of sample variances 


1 a 
pa Eee tn — Se OT 


Here we have reduced the divisor ton — k so as to correspond with the number of degrees 
of freedom. 


2 
= 


2 
Furthermore #, will be distributed with variance - and hence (assuming without 
j 
loss of generality that the parent mean is zero), 


BS” (nj (6, — #)*} = E(B (ny #2) — H (na) } 
j=1 


= ko? — go? 
ll fiend) Wrokan 
Putting then 
at 1 B A 
et —t mee (2; ad ne e e ry ry (21.67) 


we have another estimate of o%. Within sampling limits s, and s, should be equal. If 
they are not, we suspect the homogeneity of the population. 


21.38. The above test is a simple form of the analysis of variance, which we shall 
study extensively in Chapters 23 and 24; it is therefore unnecessary for us to develop it 
further at the present stage. Essentially the test is one of simultaneous significance of 
differences between means on the assumption that variances are constant. We shall also 
discuss in Chapter 26 a generalisation of the variance ratio for testing the homogeneity 
of a set of variances. 


Example 21.9 

The following table (from the Registrar-General’s Statistical Review of England and 
Wales for 1933, Part II) shows the numbers of males married in England in that year 
classified according to age and district. (Certain small numbers of unspecified age and 
those under 21 have been omitted.) 


Age (Years). | 
. — SS TOTALS, 
District. 21- 25- 30- 35— 45— | 55- 
| 

South-East .. 31,714 43,979 14,995 7,985 3,928 3,717 106,318 
INOxtnie 2 31,507 39,849 13,620 7,108 3,362 2,916 98,362 
Midland. . . 17,465 21,486 6,729 3,340 1,624 1,509 52,153 
Waste... . 4,016 5,297 1,820 962 457 386 12,938 
Cue Weat . . 4,323 6,065 2,218 alleles 514 580 14,877 
TOTALS 89,025 116,676 39,382 20,572 9,885 | 9,108 284,648 


Note the changes in interval at 25- and 35- years. 


122 COMMON TESTS OF SIGNIFICANCE 


The question we shall consider is whether age at marriage differs significantly between 
different districts. This might, for example, be an important point if we were about to 
sample the population for some quality related to age at marriage, such as the number 
of children per family. The data might be regarded as a contingency table and y? used 
as a test of independence in the usual way. Here we adopt an alternative by considering 
the mean age at marriage in the five different districts. 

Taking the centres of the intervals to be 23, 27-5, 32-5, 40, 50 and 57-5 years (the latter 
being admittedly an approximation) and making no corrections for grouping, we find :— 


Noa Sum of Squares 
District. Number. (oars) of Deviations Variance. 

hiatal from Mean. 
South-East . 106,318 29-681,799 7,092,490 66-710 
North 98,362 29-312,626 6,092,375 61-938 
Midland 52,153 29-007,344 3,105,520 59-546 
East wees 12,938 29-425,761 807,911 62:445 
South-West. 14,877 29-873,731 1,025,284 68:917 
Whole population Se ae: 284,648 29-429,049 18,143,921 63:741 

| 


The total of the sum of squares about district means, X (x; — %;)*, is the sum of the 
figures in the fourth column, namely 18,123,580. The sum of squares 2 n, (%; — £)? is 
found to be 20,341. We have the useful check that these two together are equal to the 
sum of squares of deviations from the population mean, 18,143,921 (a property which we 
shall often require in the analysis of variance). 


Thus 
18,123,580 
see eens -67 
Sv ~ 984 648 
4 = pe = 5085-25. 


No test of significance is required to see that the difference in mean age at marriage between 
districts is not a chance effect. 


Tests of Random Order 


21.39. The tests described above are concerned with the values of a number of 
sample members but not with the order in which these values occur. Sometimes there 
may not be an order, as, for instance, if a number of plants are grown simultaneously or 
a number of names drawn from a hat in a single handful. More frequently there is a tem- 
poral order of appearance in the values, and it is clear that, on some occasions at least, 
the order may be material. To take an extreme case, suppose we are told that in a sample 
of 100 births 53 are male. We conclude that the sample is concordant with the hypothises 
that male and female births occur at random with probability 4. But if we knew in addition 
that the first 53 births were male and the next 47 female we should almost certainly reject 
the hypothesis. 


21.40. If sampling is conducted by taking members one at a time from a population 
and the process is random, then any order is as probable as any other order. The sample 


RANKING TESTS 123 


may be considered as a section of an infinite series generated by the sampling process, and 
this series ought to behave like von Mises’ Irregular Kollektiv (7.15). It is a happy 
hunting-ground for the theorist, since there is no limit to the number of tests which can 
be invented to ascertain whether a given finite series conforms to the random scheme. We 
have considered a few such tests in connection with random sampling numbers (8.15) 
and shall discuss others in connection with time-series (Chapter 30). Here we discuss a 
few tests which are useful in detecting departures from randomness in the sampling. We 
are not now considering hypotheses as to the parent population, but since the randomness 
of the sampling is an essential element of inferences in probability it is convenient to 
consider the reliability of the sampling, together with inferences from the sample about 
the parent. 


Ranking Tests 


21.41. Suppose we have a sample of » members x, ... %,, in that order, and are 
doubtful about its randomness. Such doubts may arise owing either to defects in the 
sampling or to possible alterations in the population while the sampling is going on. In 
the first case the process itself is at fault ; in the second, circumstances are at work to make 
the sample something other than it purports to be, a random sample from a single popula- 
tion. Either influence may relate the magnitude of the x’s to the order in which they 
occur, and the values z, . . . x, are not then a random order in the sense that any other 
order was equally probable. 

Let us then consider all the possible orders, m! in number, of the observed values 
%,...2,. A proportion of these, determined by a significance level of 5 per cent. or 
1 per cent., say, we will decide to reject as improbable ; and we will select as the “ improb- 
able ”’ rankings those which exhibit the systematic appearance of which we are afraid, 
and particularly the regular rise or fall from x, to x, im magnitude. In short, we rank the 
sample in order of magnitude, say X, . .. X,, where the X’s are a permutation of 
the first m integers, and compute a rank correlation coefficient between this order and the 
order 1... n. If the coefficient is large in absolute value (“large ’’ being determined 
by the significance level) we suspect the sample of being subject to systematic influences. 


Example 21.10 


Thirty persons in the income group £1000-£1500 are asked to supply returns of their 
annual income for some purpose connected with taxation. It is intended to summarise 
their replies by a given date, but when that date arrives only 20 answers have been received. 
This is a frequent event in postal inquiries, even when the return is compulsory, and it 
has to be decided whether the 20 returns may be accepted as representative of the 30. 
There are prior reasons for suspecting that persons with bigger incomes may delay more 
than the others, partly because of difficulty in completing returns and partly because of 
a natural reluctance to part with information which may tell against them.*. We there- 
fore wish to ascertain from the 20 returns whether there is any evidence that persons with 
smaller incomes tend to submit returns earlier than those with larger incomes. 

Suppose the 2C returns give incomes, in that order, of £ per annum: 1180, 1270, 1400, 


* This is an assumption for the purposes of the example and not intended as a statement about 
taxation returns in real life. 


124 COMMON TESTS OF SIGNIFICANCE 


1090, 1190, 1250, 1170, 1300, 1290, 1310, 1280, 1350, 1320, 1380, 1420, 1390, 1470, 1360, 
1220, 1460. The ranking order is— 


No. of sample . 1 2 3 4 5 677 8 9 10 11 12 18 14 15 16 17 18 19 20 
Incite So eoqueode a 17 Jv 4 62) 10 9.11 98 seo b Ss 16 20m aoe 
Difference......- —2-5-143105-20-1 3-1 1-1-3 0-3 414 1 

The sum of squares of differences is 508 and thus the Spearman coefficient of rank 
correlation between observed and natural order 1... 7 is 

6 x 508 
=l1— = 0-618 
J 7980 


The probability of obtaining such a value or greater (16.18) may be found from “‘ Student’s ” 
distribution by putting 


and is found from Appendix Table 3, vol. I, to be about 0-004. The test confirms our 
suspicion that size of income is correlated with order of appearance, and if we intend to 
use the mean income of the 20 returns as an estimate of the income in the full 30 we must 
recognise that it may very well be an under-estimate. 


21.42. It will be noted in this example that we have made no assumption about 
the distribution of incomes in the sample or the population (the latter of which would 
certainly not be normal) and have used the sample values themselves without any reference 
to the question whether they were representative. This does not invalidate our inference, 
which is made within the population of samples obtained by permuting the observed values. 
(Cf. 17.44 and 17.45.) 


21.43. A second test of use in random series, particularly when it is suspected that 
cyclical effects are present, may be obtained by counting the occurrences of “‘ peaks ”’ or 
“troughs ” in the series. A member is said to be a “ peak” if it is greater than the two 
_ neighbouring members, and a “trough ” if it is less than those members. In either case 
it is a “ turning-point”’. The interval between turning-points is called a “ phase ”’. 

Three consecutive observations are required to define a turning-point. If the series 
is random the probability that any given three provides a turning-point is 2, for the values 
X1, 2, ; May occur in six orders and in only four is the greatest or least value the middle 
one. Ina series of NV terms there are V — 2 sets of three, and hence the expected number 
of turning-points p is 

E(p)=3(N—2) . . . .  . (21:68) 
The variance and higher moments of p are not so easy to determine. Like the ranking 
problems considered in Chapter 16 (to which the present problem is analogous), the dis- 
tributions resulting are rather complicated. We quote without proof the results 


16N — 29 
a?) : ‘ : . , . (21.69) 
— %6W+)) 
Ha (2) = — ee : : ; ; . (21.70) 


448N2 — 1976N + 2301. 
4725 


ba (p) = ° » (21.71) 


RANKING TESTS 125 


As N tends to infinity the distribution tends to normality fairly rapidly, and p may, 


for finite N, be taken as normally distributed about mean 2 (N — 2) with variance 
16N — 29 


90 


21.44, A further test may be derived from the distribution of phase lengths. The 


probability of a phase of length d in a series of d + 1 terms is clearly for only 


2 
(d+)? 
two of the possible permutations are favourable. In a series of length N there are 
N —d —2 possible phases of length d, for d + 3 points are required to determine the 


phase. The probability of a phase d in d + 3 terms is 


{ ee. ee ! : ates . (21.72) 
(@+1)! (+2)! (@+2)! (d+3)! (d + 3)! 
and hence the number of phases of length d is 
2(N —d — 2) (d? + 3d +1) 
(d+ 3)! a 


N! 


. (21.73) 


Now the number of possible phases is 
wi {ars rit : : ; : 20-74) 


for there is one fewer phase than turning-points, 3 (N — 2) in number, and the whole 
series may be a phase, which accounts for the factor 2/N! In practice this is negligible, 
and for the probability of a phase d in a series of N we then have (21.73) divided by (21.74), 
namely 


6 (d? + 3d + 1) (N —d — 2) 

CL. =i 

The moments of this distribution are easily obtained to a very close approximation. 
For example, 


. (21.75) 


6 St, Wa-d—2) (#438441) 
age +3)! 
N-3 
sar ae {(d +3) @+2)(@+1)-—3(@4+3)@+2) +5 +3) —3} 


= 3)(@ + 2)(¢+1)¢d+3(+4+ 3)d+ 2) d+ 1) — 8@ + 3) (d + 2) 
+ 13 (d + 3) — 9)/(d + 3)! 


6 : d : 
watae[o—a[hnen Coo 


Sead) @4l @—21 @aat 


N 1 
ae ai the rapid convergence of ae — to e, we may write this as 
pe =) Fe es aie aie 5 Oe $) —3(e —8)} 
—e+3(e — 1) — 8(e — 2) + 18 (e — §) — 9 (e — $) }. 


wot 


My (2) = 3) ae a are ~ 2 . . . e (21.76) 


126 COMMON TESTS OF SIGNIFICANCE 


Similarly we find 


aye en — pill _ 21) N? + (4e — 17) W — (48? — 140e + 14) } 0-560. (21.77) 
21.45. In comparing observed distributions of phases with expected values the 
ordinary y?-test cannot be applied, because the probabilities of the events in a finite series 
are not independent. A test of significance has been derived by Wallis and Moore (1941), 
who consider a grouping into three categories, d = 1, d = 2 andd >3. They conclude 
that 7? calculated from these three groups can be tested in the usual Type III form 
with »y = 21 if y2 >6-3. For lower values $y? can be tested in that form with » = 2. ~ 
This test is independent of the law of distribution of the variables and is thus of general 
application. It has to be remembered, however, that generality in these matters may 
be offset by loss of sensitivity, and more searching tests may be required in certain cases. 


Example 21.11 


The following table shows the deviations from a moving nine-year average of potato 
yields in England and Wales for the years 1888-1935 (units are th ton) :— 


| 

Year. Yield. Year. Yield. Yield. Year. | Yield. 

1888 — 6 1900 —7T —15T7 1924 —~ 1T7 
89 +2P 01 +6P qe a) ie 25 + 2P 
90 —4T7 02 —3 + 2 26 — 97f 
91 —3 03 —7T + 1 27 — 3 
92 —1 04 +2P — 27 28 qe hie 
93 +6P 05 (0) ge! qe a) Je 29 + 5 
94 —27 06 +1°P + 4 30 ae dl 
95 +7P 07 —7T — 47 31 —10T7 
96 +3 08 +8P — 3P 32 se 
97 —6T 09 + 4 — 97 33 ap 2 
98 +2P 10 +37 +11P 34 + 5P 
99 0 Wil +4P = Il } 35 — 4 


We have marked with P and 7 the peaks and troughs of the series. The observed 
number of turning-points is 31 in a series of 48 terms. The expected number is, from 
(21.68), 3 (48 — 2) = 30-67, almost exactly the number observed. No test of significance 
is required. 

The duration of phases is :— 


Observed Predicted (21.75) 
o—r ; : c : : 20 18-75 
2 i : : 5 : 6 8:07 
3 and over . ; ; : 4 3:18 
30 30°00 


Here, again, a test is hardly necessary. We find, in fact, y? = 0-826, $ of which for 
y = 2 1s not significant. 

We conclude that these tests provide no evidence against the randomness of the series 
and hence do not suggest any cyclical movement in the yields. 


CONDITIONAL TESTS 127 


21.46. In the foregoing example we have treated the two values in 1923 and 1924 
as a single value since they are equal. These so-called “ ties” frequently occur in ranking 
work and are a great nuisance. In the present case there is only one, and any reasonable 
method of treating it will not affect the test. Where “ties”? are numerous enough to 
make a serious difference some systematic method of treating them is desirable, particularly 
if more than two individuals are tied. They may be treated as a single observation, as 
in this case (although it would probably be better then to reduce N accordingly) ; or, 
preferably, they may be counted as a mean value, e.g. with a tied pair we should consider 
the first as greater than the second and then the second greater than the first, counting the 
number of turning-points or phases as one-half in each case and adding the two together. 
This, as in all similar ranking problems, makes the theoretical discussion of sampling very 
complicated, and if it is desired to make a precise use of significance tests a further possi- 
bility is to assume that the tied members are ranked in the order most unfavourable to 
the hypothesis under test, so as to be on the safe side. 


Conditional Tests 


21.47. When several unknown parameters are concerned, it may be difficult to find 
a sampling distribution dependent only on one of them which will form a basis for estimation 
or a test of significance. Sometimes, however, we can get rid of undesirable parameters 
by restricting the distribution in some way, and particularly by considering a distribution 
of samples which have some specified quality in common with the observed sample. Such 
distributions we shall, in Bartlett’s phrase, call conditional. Fisher expresses a similar 
idea by speaking of samples which have the same configuration. 

The most important application of this principle is in the testing of regression 
coefficients, which we shall consider in the next chapter. Here we give a simple illustration 
of the method for the Poisson distribution. 


Example 21.12 

Suppose we have two samples from populations which are known to give the Poisson 
type of distribution but may have different parameters. We wish to determine whether 
the populations could be identical. 

Suppose the frequencies of successes in the two samples are r, and r,. If is the para- 
meter of the parent (assumed the same for each), the probabilities of the samples are 


e~4 ia and : ae 
r, | ra! 
and their joint probability is accordingly 
e— 74 Aritts 
P{n, Ts | A} = “ty! ta! e . . ° . (21.78) 


This depends on A and does not help us in answering the question. However, for the 
probability of a sample with r, + r, successes we have (since the sum of two Poisson variates 
with parameters /,, A, is distributed in the same form with parameter A, + A,) :— 


en 2A (2a)nt ts 
jee +r,|A} — ery |? 


128 COMMON TESTS OF SIGNIFICANCE 


and hence ay ; 
PA ee A (7, + 1,) 1 r 
= = a e (21R79 
Pirn+nla} airtel Fe,tnl Gu) 
where r=17, +12. 
Now in accordance with Bayes’ theorem we have 
P {ry %2[A} = Pfr, [1% +re}P{r, +7, | 4} 
and hence 
r! 
Pe ty > neo lrala - (21.80) 


Consequently, if we confine our attention to samples for which the total number of successes 
is r, the probability of the observed 7, and r, is independent of A and is, in fact, the corre- 
sponding term in the binomial (4 + 4)’. The probability is clearly that of a partition of 
y into the observed 7, and r., and if it is small we suspect the hypothesis that the samples 
emanated from the same population. 

This kind of conditional inference raises the same sort of point as we noticed in 17.44. 
We decide beforehand that, whatever r turns out to be, we will make the inference in the 
population of samples which yield that value of r. 


Pitman’s Tests 

21.48. In the extreme conditional case we may consider an inference in a population 
of samples the members of which are the same as those actually observed, the population 
being given by permutations or partitions of the observed values. The tests of ranking 
and periodicity given above are cases of this kind. A similar procedure has been advocated 
by Fisher in the analysis of variance and the design of experiments, and will be considered 
in due course. We now proceed to examine tests of the same nature proposed by Pitman 
(1937a, 1938). 

Suppose we have two sets of values u, ... u,, and v,... v, with means % and 0 


and the mean of the two together equal to Zz. Given m + n objects, there are se . be 


ways, say N, of separating them into two sets of m and n objects, of which the given set 
is one. We call | i#—@| the spread of the separation. Since 


mu + nd = (m+ n)zZ, 
we have also for the spread 


(m+n) |%—2| _ (m+n)| 2 (u) — mi | 
n mn 


. (21.81) 


Take a probability 1 — « = M/N, where M is an integer. If R is a particular separation, 
and the number of separations with spread not less than that of R is not greater than M, 
we call R discordant. If there are M or more with a greater spread we call it concordant. 
A separation which is neither concordant nor discordant is called neutral. If m =n the 
separations occur in pairs with equal spreads, and we then take M to be even. The 
discordant separations are most easily picked out as those with the largest values of 
| X'u — mz |. 

If the observed separation is arrived at by chance, the probability that it is discordant 
is M/N =1-— «@ when there are no neutral separations. If such exist, the probability 


PITMAN’S TESTS 129 


is less than 1 —«. Similarly the probability that a separation is concordant is 1 — «, 
or more, as the case may be. 

Two samples u,...u,, and »,... v, are said to be discordant, concordant or 
neutral according as the separations u and v are so. Having selected our significance 
points dependent on «, and hence having fixed M, we can find for what values of the spreads 
a pair of samples is discordant or otherwise, and hence whether our observed pair is so. 
If they are discordant we reject the hypothesis that they came from the same population. 


Example 21.13 (Pitman, 1937a) 
Two samples have the following values :— 
0, 11, 12, 20 
16, 19, 22, 24, 29. 
Are they significantly different ? 
There are 9 members altogether and hence & = 126 separations into samples of 


five and four. We take « to be as near as possible to 0-95, corresponding to a 5-per-cent. 
level of significance, and hence M = 6. We then find the groups which have the largest 
values of the spread. We have Z = 17,so that mz = 68, and using the form pe u— 68 | 
we find those groups of four from 

0, 11, 12, 16, 19, 20, 22, 24, 29, 


which give the maximum value to this quantity. They are— 


| Du — 68| 
0, 11, 12, 16 ; . ; : eo 
0, 11, 12, 19 : : : : . 26 
0, 11, 12, 20 : : ; . 25 
29, 24, 22, 20 ; p . 27 
29, 24, 22, 19 ; : : : . 26 
29, 24, 20, 19 ; : . . 24 


The group 0, 11, 12, 20 gives the fifth largest spread, and so with M = 6 the observed 
separation is discordant. Our inference is that the samples come from different popula- 
tions. Only in four other cases out of 126 should we get so large a spread in samples from 
the same population. 


21.49. The extended use of the above test is barred by practical inconvenience, 
but an approximate form based on a different measure of discordance may be used. We 
now put (a 2 

m (ki —z 
w= a ee) . . . . ° 21.82 
(N — m) te ( ) 
where yz. is the variance of the samples taken together and is thus a constant. The function 
w is hence linear in (@ — 2)2, the device of squaring, as usual, getting rid of difficulties 
associated with the use of the modulus |%@ — Z|. N here refers to the total sample 
m+ nN. -_ 

Now, for the moments of i — 2 we may use the results of 11.26 (vol. I, p. 284), giving 

the moments of the mean in sampling from a finite population; for 2 is the population 
A.S.—VOL. Il. K 


130 COMMON TESTS OF SIGNIFICANCE 


mean. Replacing » in the formulae of that section by m and putting N =m +n, 
we have— 


E(%& —z)=0 
ee oN eat 
E(a —2)4 = 


(N — m) [{N? + N — 6m (N — m)} uy + 3N (N — m — 1) (m — 1) pe] 
m? (N — 1) (N — 2) (N — 38) 
and hence for the first two moments of w we find 


E(w) = — . 4 6 Giese 
E(w?) = as (1 + 6), . - . . (is 
where 
Seninemties| i) epee 
y, referring to the measure of kurtosis “* 


Me 
For fixed N the modulus of the second factor in (21.85) will be found to have a maximum 
at a when m = 3N, and it takes this value again at 
N—2m _ N —2 
NO 2N —V 
ar m 
giving >, 


=1or5 for N = 14 and wider limits for larger N 


. It will also be found 
that for N > 6 the factor el ate ay) 
2 (N 2) 


m(N — m) 
yw i 


— 6 is not greater in absolute value than 


1 


m 

pe <5, 
ee 

i.e. unless one sample is more than four times as big as the other. 

and y, not large, 6 is small, and approximately 


Thus for such values 


3 
E(w) = =~. . (21.86) 
Similarly, using the fact that for large m and N 
E(j—4* =1.3.8... (r—-ny(1—™)& 
(a —2) (rH (1 —5)-4, 
we find approximately 
B (ws) = 3.5 


(N= ieee 2 3) ° ® . (21.87) 
The moments given by (21.83), (21.86) and (21.87) are those of the B-distribution 
it 
i 2 — w)tN-2 w-t 
3G 


. (21.88) 


PITMAN’S TESTS 131 


which can therefore be used to approximate to the distribution of w. In point of fact the 
distribution seems to be remarkably close. 
w may also be written 


mn 


(a — 0)? 
w= a +* ______.__ , we (21.89) 
E(u — at +5 — a) + Pw — 9) 
which shows that w <1. 
We also have 
Hy — o)* 
w mtn 


. (21.90) 


1 —w ZS(u—a?+2(v — 6)? 

and it is instructive to observe that the function on the right is the same as that of 
u2 

Ny + NM, — 


** Student’s ’’ form will in fact show that we can test J > in the ¢-distribution with 


5 of (21.32) with a few changes of notation. A transformation of (21.88) to 


y=m-+n— 2; for (21.88) then becomes 


du 
adF i = ree pee : : 2 (210L) 
( =) 
where | u=,/ aS. , : : . peieoS) 
l—w 


21.50. A test of a similar kind may be evolved for the product-moment correlation. 
Suppose we have two samples z,... 2, andy, ... y, and calculate 


pee CON TY 
~ 4/(var x var y) 


for every possible pairing of the x’s and y’s, m! in number. As before, if we choose an 
« and hence a number M such that 1 — « = M/n! we may determine those pairings for 
which r is greatest and reject the hypothesis that x and y are independent in such cases 
if they fall among the M greatest. Since the denominator of r is constant, this is equivalent 
to attributing significance to the values of |X xy — nzg| which exceed a given value 


determined by «. 
Taking = 7 = 0, without loss of generality we find 


i) = i a an ner see ee (1K) 


E (r?) = (ae E (xy)? 


n* var x var y 


—ee.. «= %  % (aren 


132 COMMON TESTS OF SIGNIFICANCE 


and similarly, if y,, y, are the modified measures of skewness and kurtosis for 2 (expressed 


ky 


in terms of k-statistics, i.e. 71 = rat a a) and y, and y, those fer y, it will be found that 
2 2. 


E (r?) =e”! i! oo Se. Ors 
3 (n — 2) (n — 3) 
(n—1)(n+1) n(n +1) (n — 1) 


Thus to order n~! we have 


E (rt) = Va Vos ; . (21.96) 


E (r) = E (r3) =0 


EG) 
Oe =a rr 
33 
E (r*) = 
= Go) @ sh) 
These are the first four moments of the distribution 
1h 
Se | uel =a le. : , 
dF Ce i‘ <) 2, og i | (21.98) 
Thus 7 may be tested in this distribution or equivalently, putting 
r 
= —___. —2 ; : : : f 
ieee <2) (21.99) 


in “ Student’s ” form with » = n — 2. 

In particular, if the numbers x and y reduce to rankings, we have the test already 
introduced in 21.41. Compare also the result given for the distribution of Spearman’s 
p in 16.18 (vol. I, p. 401). 


The Combination of Tests 


21.51. It sometimes happens that we have a number of tests of significance, all 
yielding various probabilities, which we wish to express as a single probability. Suppose, 
for instance, that we conduct an experiment five times and that some test, such as that 
of the mean, gives probabilities to the observed deviations of 0-2, 0-8, 0-01, 0-1, 0-03. In 
the ordinary way two of these values would be regarded as significant and the other three 
not. What conclusion are we to draw as to the five taken together ? 

Suppose we have & values of the probability, p, ... p,. The distribution of any 
particular p is rectangular, i.e. 

dF =dp 0<p<l. 
Hence, if x = — log p the distribution of x is 
GE =" 02, 0 <2 <= 
and its characteristic function is 


$(t) = {. ett 2 dy 


is Sel 
a7 


THE COMBINATION OF TESTS | 133 


Hence if we write 


k 
A= log p;, . : : ‘ P2100) 
the distribution of A has a characteristic function 


1 
d (t) = (1 — it)®” 


and is therefore given by 
1 
a ee : . ~ (2E 101 
ro e Ada ( ) 
Putting 
M?—=2A=—2Zlogp=—2loglip ; . . (21.102) 
we see that the distribution of M? is 
dF o M**lexp(—4M*)dM . 5 ; . (21.103) 


or M? is distributed as y? with » = 2k degrees of freedom. 


Example 21.14 (K. Pearson, 1933b, quoting data from E. M. Elderton, 1933). 


Pairs of boys were selected in various age-groups and one member of each pair fed 
on raw, the other on pasteurised milk. The differences in gain in weight are shown in 
the following table, together with the standard errors of the differences based on large- 
sample theory. 


(1) (2) (3) (4) (5) (6) 
i Mean Difference Ree dard Probability 

Means. ee Number in Weight ers of of Observed ieee 

: of Pairs. Gained, Raw less Diff alee Difference or B10 Pk- 
in years). Pasteurised. pre uees Greater, px. 

62 73 — 0-066 0-054 0-8888 J-9488 

73 76 + 0-022 0-053 0-3409 1-5326 

83 71 — 0-003 0-052 0-5239 1-:7193 

OF atl + 0-011 0-055 0:4207 1-6240 

10? 60 + 0-002 0-057 0-4840 1-6849 

35-5096 


The values of p; in column (5) are obtained by expressing the observed deviations in column 
(3) in terms of the standard error in column (4) and hence determining the probability 
from the normal integral. We have 
2a logio p 
2 —— = 2 a 
M 2 2X log. p ice 
— 90-50 
y= 10. 
The probability of a value of 7? > 6-86 for » = 10 is about 0-74, and the test as a whole 
does not support the hypothesis of a differential effect on feeding between the two kinds 


of milk. 


134 COMMON TESTS OF SIGNIFICANCE 


Nuisance Parameters 

21.52. From the foregoing it will have been clear that in the theories of both estima- 
tion and significance one of the main problems is to find a distribution which is independent 
of certain unknown parameters in the parent population. Parameters of this kind, neces- 
sary as they are in the specification of the parent and the precise formulation of our problem, 
can be a nuisance when we are seeking to make exact statements about some other para- 
meter on which interest is focussed. For this reason they have been named nuisance 
parameters. It may be useful if at this point we summarise the methods available for 
getting rid of them. 


(a) First of all there is the process of “ Studentisation ’, whereby we can remove 
scale parameters from the sampling distribution by a suitable choice of statistic. (Cf. 
19.26.) 

(b) Secondly, we may restrict the inference to a sub-population which is conditioned 
by having certain values in common with the observed sample. It sometimes happens 
that the distribution in this sub-population does not contain the nuisance parameters, 
whereas a distribution in the full population would do so (21.47). 

(c) In the comparison of two samples, or even the testing of a single sample involving 
an unknown mean, that parameter may be eliminated by differencing (21.27). As regards 
the case of the single sample, it is clear that if x, . . . x, are independent and 7 is even, 
the values 7, — %, %; — %4, ... X,_1 — 2%, Will also be independent and be distributed: 
with zero mean (though of course there are only 4n of them). 

(d) Transformations of the variate may sometimes either eliminate the nuisance 
parameter altogether or reduce its importance. The most noteworthy case is Fisher’s 
transformation of the correlation coefficient (14.18, vol. I, p. 345). The transformed 
function z — ¢ is distributed nearly normally with variance 1/(nm — 3), so that the difference 
of two correlations when transformed does not involve the common value of €. 
(Cf. Example 14.8.) 

(ec) We may find distributions which are independent of the unknown parameters, 
and even of the population, by using the methods of ranking or considering partitions 
(21.41, 21.48). 

(f) The fiducial argument, in at least one known case, gives a test independent of 
unknown parameters, namely the Behrens test (20.13). 


It must be realised, however, that all these types of inference do not stand on equal 
footings. In particular (e) requires further examination, as we proceed to show. 


21.53. We may now review the many different tests which have been described in 
this chapter and consider more closely the type of reasoning on which they are based. 
We may group our tests broadly into two classes, those which give a direct test of a given 
value of a parent parameter and those which do not. 

The first class rests on a type of inference which we have discussed fully in connection 
with the problem of estimation. There is, in fact, only a difference in viewpoint, and little 
or none in essential ideas, between estimating a parameter by assigning a range to accept- 
able values (whether by confidence intervals or fiducial intervals) and ascertaining whether 
some prior value lies in that range. The significance of parameters in large samples, the 
test of the mean in normal samples by “ Student’s ”’ distribution, the test of a correlation 
coefficient in normal samples, and others of the same kind relating to a specified parameter 
‘have the same logical foundation as the theory of confidence intervals or the theory of 


NUISANCE PARAMETERS 135 


fiducial intervals, whichever is preferred. They all provide for the consideration of alternative 
values of the parameter. 


21.54. The second group of tests are not, on the face of it, concerned with the value 
of a parameter in a parent population, and some of them take no account of possible alter- 
native hypotheses. Consider, for example, a test of normality or a test of randomness. 
The hypothesis is that the population is normal or the sampling is random, as the case 
may be, but this does not specify a parameter. What alternatives to normality or to 
randomness are we considering, if any ? We must have the existence of such alternatives 
in mind, however vaguely, for otherwise we should not be testing these particular 
hypotheses. But can we say what they are? And if not, do our inferences remain valid ? 
When working with a probability « shall we still be right in a proportion « of the cases in 
the long run ? 


21.55. The kind of argument we have used in all these cases is this: on the given 
hypothesis the observed sample and all samples providing a greater value of the statistic 
being used for the test have a small probability. Therefore we reject the hypothesis. 

We may note at once that in rejecting the hypothesis we do so in favour of another 
hypothesis for which the observations are more probable. We may not express this thought 
explicitly, but it is there. The various statistics we use for testing normality, for instance 
5,, can arise with greater probability from other populations which are skew or have a 
marked deviation from mesokurtosis; the fact is assumed as self-evident (as indeed it 
is) and hence, if the statistic is improbable for the normal case there will be non-normal 
eases of greater probability. We remark, nevertheless, that the actual probability « is 
calculated on the normal hypothesis and does not hold for the non-normal cases. Thus 
we can no longer assert that we are right in proportion « of the cases. We are therefore 
relying on a less definite principle of inference to the effect that we reject a hypothesis 
which gives an improbable value to observation, provided that there exists some other 
hypothesis which gives a more probable value. 


21.56. A similar argument applies to tests of randomness. It is obvious that many 
other methods of generating a series exist which give a greater probability to a systematic 
series than the random method, and in rejecting the latter we do so more or less consciously 
in favour of the former. Our intuitive feelings on the point lead us to apply one test when 
we have the possibility of systematic order in mind (the ranking test) and another when 
we are interested in oscillations (the phase test). What we are doing, in effect, is selecting 
the test of randomness which we feel to discriminate best between the hypothesis of 
randomness and the alternative possibilities. 


21.57. Although, therefore, much remains to be done in putting tests of normality, 
randomness and goodness of fit on a formal logical basis, there do not appear to be any 
serious difficulties in doing so insofar as the specification of alternative hypotheses is con- 
cerned. But there remains the difficulty hinted at at the beginning of 21.55. In the 
majority of cases we have a probability 1 — « that the observed statistic ¢, will be exceeded, 
and if this is small reject the hypothesis. But why exceeded? Why reject the hypothesis 
because of the improbability of a number of events which have not happened ? 

Here also it seems that a closer inquiry into the logic of the process would be worth 
while. We have seen how it can be. justified by confidence-interval or fiducial theory 


136 COMMON TESTS OF SIGNIFICANCE 


when a parameter is under consideration. When no parameter is specified, the process 
must, in the present state of our knowledge, rest on more intuitive ideas. My own view 
is that, in a vague kind of way, we are really considering the range of values of a parameter 
without realising it. In selecting a statistic to carry out the test, we usually relate it to 
the sort of effect we are expecting to divert the real state of affairs from those of 
our hypothesis. For instance, if we suspect cyclical effects in a random series we base 
a test on oscillations in that series. The further the series deviates from randomness the 
greater will be the value of our statistic ; and consequently, if we could measure deviation 
from randomness (in the direction of cyclicality), we should have a parameter which could 
be located in a range in the manner of confidence intervals. Such a range would exclude 
the larger values of our statistic if it can be regarded in any sense as estimating the para- 
meter (or, more generally, as increasing with it); and hence the procedure of rejecting the 
hypothesis if the statistic is among these large values may be justified. 


21.58. It is for this reason that we began the chapter by defining tests of significance 
in relation to a parameter-value given a priori. It seems probable that in the ultimate 
analysis no other definition will be satisfactory. The fact that in this chapter we have 
given tests of hypotheses which do not appear to specify a parameter value is, I think, 
merely a reflection of the fact that the nature of those hypotheses and the inferences about 
them are not usually understood clearly but are based on more or less intuitive ideas. It 
is probable that many of these ideas are sound and can be given explicit logical foundation ; 
but the matter awaits investigation by the statistical logician. 


21.59. There remains for consideration the type of inference used in Pitman’s tests 
(21.48 and 21.49). These are of the character of tests of randomness. Given a set of 
values, we consider all the arrangements in which they could have happened and reject 
the hypothesis if the observed arrangement is improbable. Here again, as it seems to me, 
there is a suppressed series of alternative hypotheses which would make the observed 
value more probable ; and in choosing the test, such as the “spread” or the high value 
of a correlation, we are intuitively relating the magnitude of a statistic to the deviation 
from randomness. Pitman himself has shown, however, that when the hypothesis is 
definite and specifies the difference of two means, the tests give confidence intervals in the 
ordinary way (cf. Exercise 21.15.) 

We shall resume the general theory of tests of significance in Chapter 26, 


NOTES AND REFERENCES 


For the use of the ¢-distribution in non-normal cases see Geary (1936b) and Bartlett 
(1935a), the latter of whom shows that, for moderate samples, departures from meso- 
kurtosis are not very serious. For approximations to ¢ in the normal case see Hendricks 
(1936) and Hotelling and Frankel (1938). For approximations to the z-distribution see 
Cochran (1940a), Cornish and Fisher (1937), and Paulson (1942). See also references to 
Chapter 23. 

For the further theory of the y?-test see Neyman and Pearson (1928, 193la) and for 
another test of goodness of fit Neyman (1937a). The theory of 21.44 has been studied 
by a number of writers, notably by André (1884), Kermack and McKendrick (1936, 1937), 
and Wallis and Moore (1941). 

The amalgamation of tests given in 21.51 was. apparently first given by Fisher in an 


EXERCISES 137 


early edition of Statistical Methods for Research Workers and was studied in detail by 
K. Pearson (1933) under the title of the P,-test, and by E. 8. Pearson (1938). 

For a test of significance of the difference of two variances in samples from a bivariate 
normal population see Hirschfeld (1937), Finney (1938), Pitman (1939c), Morgan (1939), 
and De Lury (1938); and see Exercise 21.3. 

For the tests by Pitman, see his papers of 1937a, 1938. The similar problem in the 
testing of homogeneity in the analysis of variance has also been studied—see references 
to Chapters 23 and 24. 

For the test of difference of means when variances are unequal from the point of view 
of confidence intervals see Welch (1938) and the appendix to this paper by Miss Tanburn. 


EXERCISES 
21.1. For the population represented approximately by 


- 1 K tay 
a sy} — =] (3e —a)}e 2 dx, 
show that, if «3 is negligible, the joint probability of a sample xz, . . . x, differs from that 
if x, is zero by a term 


> #1 —3 3) | exp (— 4227) dx, spe Gua 
(Q00)2 j=l j=1 


By the transformation 


Y2 ae + % — 225) 


Uso a ee ee - + &,) 


and the further transformation 


¥. =p Sin $p_3 SIN py_4 - - - SiN J; SIN Fo 
Y, =p sin >,_3 SiN d,_4 - - - SIN dy COS do 
ys =p sin $, 5 sin $y 4... COs 4 
Yn—1 = Pp ©O8 $n_3, 
show that the corrective term to the distribution of ‘‘ Student’s”’ ¢ is 


eee > 70 mee ee | | ed 
al (ae Ay ie eee > lee) ( heee 


and hence obtain equation (21.11). 
(Geary, 1936b.) 


21.2. By the polar transformation of the type of the previous exercise applied to 
all n variates show that if a random sample is drawn from a normal population with zero 
mean the frequency element may be written as 


p*-1 e-H" dp doy sin $4 dd, sin? $.dd . . . sin"-? dao ddn_a. 


n 


(200)? 


138 COMMON TESTS OF SIGNIFICANCE 
a (2 


Hence if w = !~—, where s? is the sample variance, the distribution of w is independent 
ns 


ie oe 2 
of that of s. Hence show that for the distribution of w, writing a = Ye = 


My pao ee + 3)}* me 
in+1) Jn 


1 2 
[a Say + a? ni) } 


1,0 n+1 
[ly = = {In + 3n® + a2 n') )/ 


[ty = a {3n + (8a? + 3) n@) + 6a? n8) + af ait} | 
n 


Hence show that for n = 50, 1/8, = — 0-24 and f, = 3-10, indicating fairly rapid tendency 
to normality. 


n+2 
= 


(Geary, 1935qa). 


21.3. Show that in samples from a normal bivariate population 


1 C2 poy | 
= x dy, 
dF ce exp | sas (3 oc os y 
the functions Uy = od EE Y= oe BL) 
Oy Oe O, Ge 


are distributed independently and that their correlation coefficient R may be written 


7 oe _ @ 4 
~ V/{ (a + a)? — 4aar?y 
=  (% — £)? 
h a= 1 0 = ae 
‘i a: a (y — 9)? 


and r is the correlation between the observed 2’s and y’s. Hence show that 
pa Pv m — 2) _ (@ — a) v(m — 2) 
Vil— BR) {4 (1 —P)azx} 


is distributed as ‘‘ Student’s”’ ¢ with n — 2 degrees of freedom. Show how to test the 
ratio « from this result. 


(Pitman, 1939c. The test has the remarkable property of being independent of the 
parent correlation p.) 


21.4. If an even number n of members of a sample come from a population with 
mean yu, show how to find a sample of half the size distributed with twice the variance 
about zero mean. Hence show how to extend the result of Exercise 21.2 to the case where 
the population mean is not zero. 


21.5. Ifa parameter admits of a sufficient estimator, show that a test of its significance 
can be derived direct from the likelihood function. 


21.6. Derive equations (21.47) and (21.48). 


EXERCISES 139 


21.7. Let lu, i... . ini be (m — 1) linear functions of the observations which 
are orthogonal to one another and to #,, and let them have zero mean and variance o?. 
Similarly define 1, .. . byguse 

Then, in two samples of n from normal populations with equal means and variances 
oi and o2, the function 

Vn ( — 4) 
{2 (Ly + b;)?/(n — 1} * 

will be distributed as “‘ Student’s ” ¢ with » — 1 degrees of freedom. 


(Bartlett, 1937c, and Welch, 1938b. The test does not depend on the ratio 067/03 and 
can be extended to the case of unequal sample numbers, but only at the expense of losing 
efficiency in the sense that the degrees of freedom number one less than the lower of the sample 
numbers.) 


21.8. Given two samples of ,, n, members from normal populations with unequal 
variances, show that by picking n, members at random from the n, (where n, > n,) and 
pairing them at random with the members of the first sample, a test of significance of 
difference of means can be based on “‘ Student’s ” distribution independently of the vari- 
ance ratio in the populations. (This test, again, is exact, but sacrifices the information of 
N, — ”, members of the second sample.) 


21.9. If z is the ratio of the sample mean to sample standard deviation in normal 
samples, and n is large enough for the distribution of the variance to be regarded as normal, 
show that 


2 t 
C, V(2n) ————~ =c 2n 
VGe) a2) VO Taam 1} 
is distributed approximately normally with zero mean and unit variance, where 
n 
_- ft) oy a 
ceeN 1 r(25+) 4n 32n?° 
a 2 


(Hendricks, 1936.) 


21.10. If z, y have a continuous frequency function f(z, y), their characteristic 
function is " 


ioe) too] 


: d (u, v) = \_J oe (tua + ivy) f (a, y) dx dy. 


Show that the distribution of z when y is given has a characteristic function 


{. ewe d (u, v) dv 
¢ (uly) = po 
J 


ew ¢ (0, v) dv 


(Bartlett, 19380.) 


21.11. Ifa set of parameters 6, . . . 0, admit of a set of sufficient estimators, show 
that conditional inferences independent of 6, .. . @, are possible, the conditions being 


140 COMMON TESTS OF SIGNIFICANCE 


that the estimators are constant for the samples concerned. Conversely, if conditional 
inference is possible, the irrelevant parameters must admit a set of sufficient estimators. 
(Bartlett, 1937c.) 


21.12. In a normal sample of n values show that if 
me Ly — Xe 


— a/(2n) 


n 
and ns’? = J a* — ng’? = § (a, + x)? + ; x3, 
j=3 


where x, 2, are two sample values taken at random, then 
= 
eo 
& Sa 
8 


is distributed in the same form as “ Student’s” ratio z = = when the parent mean is 


zero. Show further that 
Weel le 
(Neyman, Lectures and Conferences on Mathematical Statistics, 1938. ‘The example shows 


that if z is “‘ significantly ” large, € must be small and hence the two criteria based on z and ¢ 
lead to opposite conclusions.) 


21.13. In a 2 x 2 contingency table, show that the border relative frequencies 
are, on the hypothesis of independence, sufficient estimators for the probability of success 
of the two attributes defining the table. Hence derive the exact test of significance in 
such a table as a conditional inference. (The exact test is given in 12.16, vol. I, p. 303.) 

(Bartlett, 1937c.) 


21.14. If two samples are drawn from a bivariate normal population, v,, and v,, 
are their covariances, V,, and V,, are the variances of the pooled samples, and V,, its 
covariance, show that the distribution function 


F (Vie, V9 | Vir, Via, Vasa) 
is independent of the parent variances and correlation. Hence that the distribution 
would provide a test of the difference of sample covariances. 
(Bartlett, 1937c.) 


21.15. If two samplesz,... 2, andy, ... y, are drawn from populations which 
differ only in location and the difference in means is d, show by considering the values 
typified by « + d and y how to set confidence limits to d, based on the distribution of 
w of equation (21.82). 


(Pitman, 1937a.) 


21.16. In the previous exercise show that the confidence limits for d are the same 
as those based on “ Student’s ” distribution in the case of normal populations with different 
means and identical variances (equation (21.32) ). Explain why the latter test is only 
valid for normal populations, whereas the former is valid for any population. 


CHAPTER 22 


REGRESSION 


The Analytical Theory of Regression 


22.1. When considering the theory of correlation in Chapters 14 and 15 we introduced 
the concept of linear regression of one variate on a set of “‘ independent ” variates. We 
shall now study this subject more fully and extend the theory to the case where the regres- 
sion lines are not straight. In the first instance we confine our attention to bivariate 
populations, but the majority of our results are easily generalised to the multivariate case. 

In speaking of one variate as “‘ dependent ” and the others as “ independent ” we 
introduce what may be a source of confusion. In general, all the variates are dependent 
in the statistical sense, each on the others, and in special cases may even be functionally 
dependent. In selecting one for separate consideration and in discussing its dependence 
on the others we are usually attempting to solve a problem in estimation : for given values 
of the other variates, what is the best estimator of the ‘‘ dependent ”’ variate, or its central 
value in the distribution which it has for such given values? The idea of “ given ” values, 
that is to say values which can be selected at will, leads to our referring to them as “ inde- 
pendent ’’, though they may be statistically dependent on one another. It might perhaps 
be better to use different words, but the practice is so common that we make no attempt 
to improve it. Once the point has been understood no difficulty arises in practice. 


22.2. If we have two variates x, y with frequency function f (x,y), then for any 
fixed value of y the mean of 2, say Z,, is given by 


2, =| af (x, ¥) ax | | St (x, y) dx. Z : mee.) 
The expression on the right is a function of y and thus the points whose co-ordinates 


are (#,, y) have a locus which is, in general, a smooth curve. This curve is defined as the 
line of regression of x on y, and may be written 


IL x f (x, Y) dx 


ae [te Y) dx 


(22.2) 


where X, Y are the current co-ordinates. Similarly there will be a line of regression of 
f on x given by 


[na 
Y= ce : : 2 (22.3) 
I f (X, y) dy 


We shall take Y to represent the dependent variate throughout this chapter. 
141 


142 REGRESSION 


22.3. We may also consider the more general curves typified by 


I, y’ f (X, y) dy 
Y= 2 ee : : ; . (22.4) 
' f (X, y) dy 


the regression now being of the rth moment of y on x. If r = 1 we have the regression 
of the first moment, or simply the regression. If r = 2 and y is measured from the mean 
we have the so-called scedastic curve of y on 2, 


{ (y — G)°f (X,y) dy 
—————_—_ , : : ; « (22%) 
> f (X, y) dy 


which shows how the variance of y varies with x. Other forms which have been studied 
are the clitic curve 


[Gays Bandy 
y=, Sr ee) 
{fend 


and the kurtic curve 
| (y ~ G2)*f (X, y) dy 
Y=" az : : : . (2275 
f (X, y) dy 


These curves correspond to the moments of a univariate distribution, and the main 
characteristics of a bivariate form may be studied with their aid in much the same way 
as the lower moments can be used to summarise the properties of a univariate form. 


22.4. It is interesting to remark that, just as we can find the moments direct from 
the characteristic function, so also we may ascertain the regressions of moments from 
the bivariate characteristic function, even when the distribution function itself is not 
explicitly given. 

Let us write the frequency function in the form 


F@.y)=g(e)gr) + + «© 4 4 (22.8) 
where g (x) is the total frequency for any given x and g, (y) is the frequency of y for any 
given x. In the notation of the theory of probability we should write this 


f(x,y) = 9 (x) 9 (y | &). 
The characteristic function of x and y is then 


b(tst) =| [exp fine + itey}g @) ge (y) de dy 
=|. eth g (x) $, (ts) de ee oe eee 


aiee $, (ts) -| ah. (y)dy se eee) 


and is the c.f. of y for a given z. 


THE ANALYTICAL THEORY OF REGRESSION 143 


If the rth moment of y about the origin for a given x is y,,, we have 


: a 
r 2a | eae t 
a Ure E py ( 2) le 


E b (ta th) | 


arr 
Thus, by the Inversion Theorem, 
— ij (* ar 
(a) it ( — le e-tt x E d (t, t) | dt,, : . (22.12) 


subject, of course, to conditions of existence. This gives us the required expression for 
Hy, In terms of x, and the regression can be written down at once. 


and hence, from (22.9), 


= rf CVG (2)iu. ab  . : ~ (22.1)) 
t,=0 —o 


22.5. Since 


we have 


cae a p> {N° or | d *n oe 


. . (it)? 
= ib (h,0) Dhan ee oe 6) 


7=0 


and ¢ (t,,0) may be written ¢ (t,), being the characteristic function of g(x). We also 
have, subject to existence conditions, 


; d (— +) a 7 p—tth, x . 
Dig — 2g (xy = | thee d(t,)dt. .  . —« (22.14) 


Hence, from (22.12), (22.13) and (22.14) we find 


=D Bree}, bho es (22:08) 


provided that the interchange of summation and integration in the last step is legitimate. 
Thus we have, for the regression of the mean, 
y= fe[ Coe) - eek (22.16) 
Te. g (x) a=X 
This notable result is due to Wicksell (1934b). The expansion is valid if the cumulants 
exist and if g (x) and its derivatives are continuous in the range and zero at its extremes ; 
for then the interchange of summation and integration in arriving at (22.15) is legitimate. 
In particular, if g (x) is normal and in standard measure we have 


¥=24H,(X) ; —- : . (22.17) 


where H; (x) is the Tchebycheff-Hermite polynomial of order 7 (6.20, vol. I, p. 145). 


144 REGRESSION 


Example 22.1 : 
For the bivariate normal distribution about the mean we have 


- g 1 x? 2pxy y? 
oF bem |— sql na ta) |e 


$ (tr, t2) = exp {— 4 (of tf + 2po1 2 t, ts + 03 #3) }. 


| Ot, i, 


; + ie : 
9 (%) Mig = | po, 0,t, exp {— 40? — it, x} dt, 


Hence 


— po, 6, t, exp (— do? #7), 


pos ate 
= — GRID | Yaar 
oi V (2) 
Hence 
PG2 
A: = 
an G1 
and pg Xx, 
Or 


the familiar relation of linearity for the regression of the mean of the normal distribution. 
Alternatively, direct from (22.17) we have, since x;, = 0, 79> 1 


Lee Be FOC) 
C2 O71 


Y= X, as before. 


ZL 


Example 22.2 (Wicksell, 19345) 


Consider the frequency distribution of & = 4'(x?) and 7 = 42 (y?) where 2, y are 
samples of n from the bivariate normal population 
i 
aF _ 
cc Cx Di, 


The characteristic function is 


Ve | j | exp (3276, 4 ly? 0.) aP = {a pen) — 9*0,04} “ 
where 0, = it, and 0, = tz. 
The distribution function cannot be expressed in a simple form, but we may determine 
the regressions without it. We have 


Eat = (5 ie yd = a oe yr, 


Thus, from (22.12) 


n [r] 8 
» (Ftr—1) #1 — (1 — pH) 


—w Cee ue 


9 (8) tre = S| 


FITTING OF CURVILINEAR REGRESSION LINES 145 


The integrals may be evaluated by successive application of 


1 ~| e-* d6 i gh-1 gt 
Qn J_.o (1 —6)* I(k) : 


and we find, for the regression of 7 on é, 


== (1 —e) {50 mee apr et. 
Thus the regressions of both mean and variance of 7 on é are linear. 


Fitting of Curvilinear Regression Lines 

22.6. From the practical point of view the case we have just considered, namely, 
the one where the distribution or characteristic function is given, is exceptional. The 
determination of regression curves has, in the majority of cases, to be carried out from 
numerically specified material, which we shall consider in the remainder of the chapter. 
We shall confine our attention to the regression of the mean. 

In general the means of arrays will not lie exactly on a smooth curve (unless of course 
we choose a curve of order equal to the number of points to be fitted, less one). Nor do 
we know a priori what is the appropriate degree of a polynomial which will approx- 
imately represent the regression line. Let us, however, assume that the regression can 
be represented by a polynomial of order p: 

Y=a,+a,X +a, X?+ -+a@,X". sg ; . (22.18) 
We will consider later how the appropriate value - p is to be determined in particular 
cases. Our problem is to determine the coefficients a from the data. As usual, we appeal 
to the principle of least squares, that is to say, we find the values of the a’s which will 
minimise 

U = 2S (y — @ — a2 —. . . — Ay 2*)?, : : . (22.19) 
the summation extending over the sample values. 

Differentiating with respect to a;, we have 

Pay) — Go oS — ty > ei gn — Ode i 0, 
and similar equations for 7 = 0,...p. Writing the moments without primes for sim- 
plicity and letting yu; represent the jth moment of x, and w;, the bivariate moment 
= (x' y), we have 


By Mo + Oy fy + e+ 2 PAyMhy = Mor 
Ay fy +A fl, + +s vag Ppt. = Ma . (22.20) 
Ay My 2 ay ee +++ tap Mop aps 
Writing now 
(fe ae 
| a om one a) s «E22 90) 


My Mpt1 +++ Map 
A.S.—VOL. II. L 


146 REGRESSION 


and A®) for the determinant obtained by substituting the product-moments flo, - - - Mp 


for the (j + 1)th column, we have, as the solution of (22.20), 
A) 
ay ee eww (22,28) 


22.7. It might appear that this solution could break down if A”) = 0. Such a 
thing is not possible, however, except in the most trivial case. In fact, if the distribution 
function of the z’s is G(x), we have for A”) 


2 p 
| iL 0 XO he ay 
3 D+ 
ae = | | | yee 1 1 dG,dG,... dG@, 
p p+1 p+2 2p 
| Xp Uy vo Ly 
or, if | 1 2% xh 
1g 
ns | ea ae 
| . 
a] 
j L & «2+ &% 


a — {| [total ... ag Dad,ae, ... dG, 


If we now permute the suffixes of the z’s in all possible ways and sum the (p + 1)! resultants 
we obtain, in virtue of the definition of a determinant, 


(p +1)! A® -{{ [> dG,d@,...dG@>, «~~. (22.28) 


and hence A) is essentially positive. 


22.8. From (22.18) and (22.22) we see that the regression line may be written 


ee eX argh oe 
Mor fo fi oes My 
Par fi Me eee Myt1 = 0 e co) ° (22.24) 
Por By Ep+1 > + + fap 


This is a formal solution of our problem. The moments yu can be obtained from observation, 
and equation (22.24) then gives the regression line. 

It will be observed that in order to preserve the symmetry we have written yu, for 
the total frequency unity. 


22.9. A somewhat different approach leads to the same solution. If we assume 
that the regression line is a parabolic curve of order p, we may find the coefficients by the 
principle of moments. This would lead us to identify the lower moments 


2 (ey) = Tai (a, +ae+...+4,2") 
as far as was necessary to determine the a’s. This clearly leads back to equation (22.20). 


Orthogonal Polynomials 


22.10. The use of equation (22.24) in practice is subject to one serious drawback. 
If we have a set of data and no guide, apart from inspection, to the appropriate value of 


ORTHOGONAL POLYNOMIALS 147 


p, the only course is to fit curves of order 1, 2, 3,. . . and so forth, until we reach the point 
when further terms do not improve the fit. Every time we add a new term the determin- 
antal arithmetic has to be done afresh. To obviate this nuisance we shall consider the 
regression line in the form 


Y = bf Pee, «4-6, Py sock) Sp | BORD) 


where the P’s are polynomials in X, P; being of degree j7. We shall determine the P’s 
so that 
2 (PF, P,) = 0, JZHk ° ° ° e (22.26) 


the summation extending over the observed values. 
In minimising 
Z(y¥—b Pi —b6,P,... —b,P,)*, 
we shall have equations such as 
2 (yP;) — b, & (Py P;) "Sane! ce — 6,2 (P, P;) =0, 
and in virtue of the orthogonal relations (22.26), this reduces to 
» (yP,) — b, 2 (P}) = 0. : 5 . « (22.27) 


Thus 6; is determined simply by P;; and if, having fitted a curve of order p, we wish to 
go a step farther and add a term b,,,, P,,,, the coefficients 6) . . . b, found from (22.27) 
remain unaltered. 


22.11. Furthermore, the use of these orthogonal polynomials will give us a very 
convenient method of determining step by step the goodness of fit of the regression line. 
We have 


a= by Pi. «2 — bP) 


See (ye ey yP.) —. . 1 — 26,2 YP,) + 0) 2 (Py 2b! Y (P 2), 
But from (22.27) we may express 2 (yP,;) in terms of 2 (Pj), and we thus find 
OC = 2 y9)— 6,2 (5) es  — B22 (2) : . (22.28) 


Thus the effect of any term b; P; is to reduce U by bj Y (Pj) and we may examine the effect 
of this term on U separately. If we find that the addition of any term b, P,, does not 
reduce U significantly, we may conclude that it is redundant (so far as concerns the 
representation of a regression line by a polynomial). 


22.12. We proceed then to derive expressions for the orthogunal polynomials in the 
general case. Later we shall examine the important special case when the values of x 
are equidistant (as, for instance, with grouped data and most time-series), 

Put 

p 
in ls 2!) 
j=0 
In this expression there are (p + 1) unknown constants c, and hence in all the polynomials 
up to and including those of the pth order there are }(p + 1) (p + 2) constants. The 
orthogonal relations up to and including order p will then provide 3p (p + 1) conditions 


148 REGRESSION 


on the c’s, so that p + 1 constants are assignable at will. We will take one for each P and 
assign it so that the coefficient of X’ in P; is unity: 


Ci; = I . . e . 5 . (22.30) 


In particular ¢)) = P) = 1. The orthogonal relations are then just sufficient to determine 


the other c’s. For instance, for the set c,,,j7 =0...p —1, they are 


= P,P, =ZP, =0 


PLE hy =21)) 
and so on. This system is clearly equivalent to the p equations 
Po ee 0 
it . (22.31) 
z ae P, = 0 


On substituting for the P’s from (22.29) we get 


Coo Ho st Cyr fy gina rlian ss + Cp, »—-1 My-1 + My =0 
Coo 1 + Cp1 Me Gate ep p= Ly + Mp+1 =0 


Cyo Mp—1 + Cpt Mp ++ + » + Cp, p—1 Mop—2 + Map-1 = 9. 


The solution may be expressed as a determinant in the usual way. Writing A?~” in accord- 
ance with (22.21) and A\” for the minor of the term in the last row and (j + 1)th column 
in (22.21), we find 


Cj == ae . ° ° . e (22.32) 


This expresses the c’s in terms of the ascertainable constants . It follows that 


Ho My «6s Py 
1 fo a ae 
P,= Ae=n : 52 Bees : . : . (22.33) 
Myp-1 Pp +++ Map-1 


1 A peseeseexe 
We notice in particular that, in virtue of the diagonal symmetry of A”), we have 
Ci = Cg : : : : . (22.34) 
22.13. In virtue of (22.31) we have 
2'(P;) = & (2 P,) 
and thus, from (22.33) on multiplying the last row and summing, 


nA) 
aes) = A@e-v* e . r) ° . (22.35) 
Similarly 
(p) 
oy Pe ee. 9 22-36) 


Aw-l* 


ORTHOGONAL POLYNOMIALS 149 


Finally, from (22.27) 
ae! (22.37) 
ae : : : : : : 

Our problem is now solved. We have expressed all the unknowns in terms of 
calculable determinants. 

We may note in passing that since the regression equation must remain covariant 
under a change of origin, all the coefficients 6 except b, are seminvariant, and the origin 
can thus be chosen at will. 6, itself is the mean of the y-values. 


22.14. Explicitly for the polynomials we have (taking uw, = 0, uw, = 1)— 


Peat aks) |) ee a eee 
i 
ee =n) ee res 2.5) 
ln On 
oe lls 
2 
ie a Ex kt Oe co 
Fo | 
Ps Peis 
i a ae ‘ 
| 1 Ms Ma bs 
lee X27 ce 
po Saal 
: = 1 
mo” tgs 
Las Ma 


a gy fe 8 = os = tats — os) Xt 
by Sets Tt 
+ (Us Ms — Ha + fa — 3) X + (Ws — 2a os + Mg)} - » (22.41) 


and so on. In particular, if the population is normal, 


the polynomials in this case reducing to the Tchebycheff-Hermite functions (6.20) which 
we know to form an orthogonal system in the normal case. 


Example 22.3. Ungrouped Data 
Table 22.1 shows the relationship between the percentage loss in weight (Y) and the 
temperature (X) in a number of samples of soil. We require to find the regression of Y on X. 


150 REGRESSION 


TABLE 22.1 


Fitting of Curvilinear Regression for Ungrouped Data 
(Data from J. R. H. Coutts, J. Agr. Sci., 20, 541.) 


Percentage Loss Temperature 
(degrees F.). 
x 


For the sums required we find— 
n = 16, 2 (y) = 82:97, 2 (y?) = 459-4363 ; 
& (v) = 2642, 2 (x?) = 474,050, 2 (x) = 91,244,582 ; 
E (vt) = 18,553,164,842, (a5) = 3,930,294,225,302 ; 
& (x*) = 858,077,668,755,250 ; 2 (yx) = 14,736-19 ; 
E (yx?) = 2,819,909-45, Y (yx3) = 571,902,362-11. 
These can be run off fairly quickly on a machine. We have not bothered to take a different 
mean from those given, but in general a certain amount of arithmetic can be saved by 
so doing. 
Considering first of all the straightforward approach of (22.24), we have for the straight 
line of closest fit, 


82-97 16 2642 
14,736-19 2642 474,050 


i 1 x 
| 7 


reducing to 


Y = 0-660 + 2-741 (05): : ° : 
100 


We have put ny; instead of uv; in the second and third rows of the determinant, as we are 
clearly entitled to do. 
Similarly we find for the second- and third-order parabolas— 


xX xX \2 
we == 61 II — WR _—_ Y —— 
3-55 0-929 ( 5 } +1070( 5) ce 4). Seay 


x xX \2 xX \3 
Y =7-783 — 8-940 ({ — | — 5- pan een ia 
(a) 5°875 ( ina) 0-9189 (=m) ; . (22.44) 


. (22.42) 


ORTHOGONAL POLYNOMIALS 151 


Fig. 22.1 shows the straight line and cubic fitted to the data by these means. An examina- 
tion of the coefficients in the equations illustrates the point made above, that as successive 
terms are added to the polynomials the coefficients of all terms may alter very considerably. 


Percentage loss in weight 
a 2) D 


&» 


100 120 140 160 180 200 220 240 260 
Temperature (degrees) 


Fie. 22.1.—Straight Line and Cubic Parabola of Closest Fit to the Data of Table 22.1. 


Consider now the alternative approach by the use of orthogonal polynomials. By 
the use of equations (22.33) we have 


16 2642 
ee | ; ie | | 16 
= XK — 165-125: 
16 2642 474,050 oe 
es 4 
P,= | 2642 474,050 91, a 582 | ee Bas Nein 
1 ne 
=X? — 343-137X - 27,032-435. 
16 2642 474,050 91,244,582 
2642 474,050 91,244,582 18,553,164,842 
474,050 91,244,582 18,553,164,842  — 3,930,294,225,302 
1 a x? x3 
Py= 16 2642 474,050 
2642 474,050 91,244,582 
474,050 91,244,582  18,553,164,842 


= X38 — §22-940X2 + 87,182-434X — 4,605,047. 
The b-coefficients are given by (22.37), the determinants in the numerator having been 
already tabulated in finding the P’s. We have 
2-7409 _ 10695 __ _ 0-91889 


io —<“<i—i‘C«w ES 1003 ’ 


b, = 51856, b, = 


152 REGRESSION 


these being the values already found in arriving at (22.42) to (22.44). Thus 
2.7409 1-0695 
eo “12 —— a . 
au (X — 16125) at 3:137X -- 27,032-4) 
0:91889 
1003 


If we stop at the second term we have 


Y = 5:1856 + 


(X# — 522-940X2 + 87,182-4X — 4,605,047). .  . (22.45) 


2-7409 


Y = 5-1856 + “= (X — 165-125) 
x 


which is the same as (22.42), as of course it must be. Similarly, if we stop at the third or 
fourth terms we find equations (22.43) or (22.44). 
Now consider the fit of the regression line. We have from (22.35), 


pa eo 
p p 


? oy Or ~ (YP,). 


The determinants in this expression have already been evaluated in finding the regression 
line. Remembering that 2 (y?) = 459-436 we obtain the following :— 


AQ) 
j- bj. n b;? Ag=1)" U (equation (22.28) ). 
0 5:1856 430-247 29-189 
1 2:7409 x 10-2 28-390 0-799 
2 1-0695 x 10-4 0-669 0-130 
3 — 0:91889 x 10-° 0-080 0-050 


In calculations of this kind it is as well to take 6; to an extra place of decimals, as the value 
of U is rather sensitive to small errors of rounding up. Even so, the last figure in U is 
unreliable. 

From the values of U it is clear that the fit is greatly improved by taking a quadratic 
term, and still further improved by adding the cubic term. How far a quartic term would 
improve matters cannot be decided without ascertaining the term. We have, however, 
not proceeded beyond the third degree because to do so would require moments of the 
eighth order. For a small population such as this, which in practical applications would 
be considered as a sample only, the errors in higher moments would probably be considerable. 

The reader who works through the arithmetic of this example will find that there is 
about the same labour involved in either method. It is in the fitting of higher order terms 
that the method of orthogonal polynomials shows its superiority. In practical cases it 
is preferable to avoid the large numbers arising from the evaluation of determinants by 
a modification of the procedure given in 22.27 below. 


Example 22.4. Grouped Data 


In Example 14.1 (vol. I, p. 331) we considered the correlation between age and highest 


audible pitch in 3379 subjects and found the linear regressions. Let us take the work 
a stage further. 


STANDARD ERRORS OF REGRESSION COEFFICIENTS 153 


For the data of the table (X = age, Y = pitch) we find— 


» (y) = — 708; 2 (y?) = 8894; 2 (yx) = — 12,535; 
Se eeweoa 2 (2°) — 47.5925 3% (2°) = 387,498 - 
E (a4) = 4,842,172; ¥ (a5) = 62,401,794; ¥ (x*) = 883,576,012. 


As a variation on the procedure of the previous example, we will convert these figures 
to moments about the mean (with Sheppard’s corrections) and put them in standard measure. 
We find— 


for = — 0°209,529; flog = 2°504,904 ; 
[ty = 0-770,642; pg == 13-348,229, 


In standard measure the other moments are 


Ms = 1-705,375; pg = 6-295,759 ; 
tts = 20°729,861; pe = 78-409,775. 


We may now use equations (22.38), etc., direct, and find 
P, =1, P, =X, P, = X*? — 1-705X — 1, P, = X* — 3-471 X2 — 0:376X + 3-560. 
We now require the moments p., and ps; We find 


& (yx?) = — 112,495 
= (yx?) = — 1,399,639, 


and hence, with Sheppard’s corrections and in standard measure, 


Mor = —1:177,920 gy = — 4:215,958, 
We now find, from (22.37), 
b, = — 0-613,626 


b, = —- 0-055,064 
b, = 0-010,205. 
The regression line of the third degree is then 
Y = — 0:6136X — 0-0551 (X?2 — 1-705X — 1) 4+ 0-0102 (X* — 3-471X2 — 0-376X + 3:560), 


where the origin is at the mean and the units are in standard measure. 


Standard Errors of Regression Coefficients 

22.15. The standard errors of unknowns derived from least squares can be found 
by the use of a result due originally to Gauss. Suppose «; is the true value of a; and the 
residuals y — Ya,x’ are distributed normally with variance v. Writing da; = a; —a,, 
we have for the frequency function of the residuals— 


J wexp —z.2 (aoe 
s j 


Sd le aa 


§ 


154 REGRESSION 


ie denoting summation over the sample and 2 over the values a to a,, and the cross- 
s uj 
term vanishing because the a’s are minimal values) ; 


oc constant x exp — a & & (da; apf)? 
Qu cp gy] 
o exp — 1 FE (da, da, 2i**) 
2Qv $s ik 


n 
oc exp — 5 AES day, My+n).  « 4 ; : . (22.46) 
d, 


In the limit, then, the deviations are distributed in the bivariate normal form, and from 
the results of 15.12 (vol. I, p. 376) it follows that 


Ward; — = — : a 5 . (22.47) 


for the determinant whose terms are w,;,,, is in fact the determinant we have already defined 
as A), and A‘) is the minor of the item in the jth row and column. 

Now v is the variance of deviations from the theoretical regression line, and in terms 
of variations about the observed line we have, remembering the result of 18.17— 


vara; = —% , ——____ : ; ‘ . (22.48) 


Since the correlation ratio of y on x is given by 
var e = var y (1 — 7”), 
we have also 
Ap (1 — 4?) vary 
AP n— p— 1 


var a; = : 4 . (22.49) 


For large samples the replacement of n by n — p — 1 in the denominator is an unnecessary 
refinement. 


22.16. For the case of orthogonal polynomials the results apply with a slight but 
important simplification. The coefficient b; is the same as a, if polynomials up to order j 
only are fitted, and hence, since AY) = A%~" we have 


AG-) (1 — n*) vary 

AD n—g—-1- 

The same result follows by modifying (22.46), which for orthogonal polynomials becomes 
es a 2 

f exp oe oe (db;) \; : : - . (22.51) 


showing that the b’s are independently and normally distributed with variance 


var b; = 


~ (22.50) 


var b; = 


mcr 
2 PP 
reducing to (22.50) in virtue of (22.35). 
22.17. If the parent population is normal, 7 = p, and the determinants A can be 
evaluated explicitly in terms of the variance of x. In fact, 


AG-)) l 


An ~ jt (var ay . . e ° e (22.52) 


STANDARD ERRORS OF REGRESSION COEFFICIENTS 155 


and hence 

var b; = eee ee iene a 4 ‘ - . (22.53) 
or, in standard measure, 

var bs. = ey eae | ° ay . e e . (22.54) 
Equation (22.52) can be found by evaluating the determinants in the ordinary way, but 
(3) 

_— is equal to 223 P?, which, in the 
normal case, is for large samples equal to H (P}) =j! (var x)? (6.22. vol. I, p. 147, with 
a change of scale). 


it follows more simply from the consideration that 


22.18. The advantages of using orthogonal polynomials instead of powers of X 
are apparent in the forms taken by the standard errors of the coefficients a and 6b. The 
latter are independent of the order of the polynomial fitted and can be tested once and for 
all. The former do not possess this advantage. It seems preferable, therefore, as a matter 
of technique, to work with orthogonal polynomials throughout, whenever regressions of 
order higher than the first are likely to require investigation. 


Example 22.5 

Consider again the data of Example 22.4 (regression of highest audible pitch on age). 
We have there expressed the regression line in standard measure and in the orthogonal 
form, and may therefore use equation (22.50) in the form 


1 — 7? A) 
var by = a Aw 
1 — 72 AM) 
es ee 
var 6, = es eo 


(The sample number 7 is so large that we can ignore the element — (j + 1) in the divisor.) 
The determinants required are already known, having been ascertained in the course of 
the work. We have 
A) A) A®) 
Aw = iL A@ = 0:4189, Ae) = 0:0985. 
We also require 7, which was found in Example 14.11 (vol. I, p. 352) to be n,, = 0-6231. 
Thus 1 — 72 = 0-6117. We find 
b. — 18104 p, = 07584 5, — 071783 
var 0; = 104 3 var 2 108? var 03 = San: 


The values of the 6’s and their standard errors are then 


Standard Error. 


0-013 
0-0087 
0-0042 


156 REGRESSION 


In all cases we should judge the coefficients significant, as being more than twice the standard 
error. Although, therefore, the second- and third-order terms are small and the regression 
is approximately linear, the deviation from linearity is not merely a chance effect. 


Exact Significance, Tests in the Normal Case 
22.19. When the parent population is normal, more exact tests than those derived 
from the use of standard errors may be obtained. We have already seen (14.21, vol. I, 
p. 348) that a function dependent only on sample values and the first regression coefficient 
6, was distributed in “ Student’s”’ form. We proceed to generalise this result. 
Consider in the first place the linear regression equation 
Y= 7 bk = so) ee . (22.55) 
and let £, be the population value of b, and o} the variance of y in the population. Since 
the parent is normal, the variance of y for any fixed value of = is o%. 
Our estimate of 6b, is 
_ Zy(e —&) 
— 3 (¢ — #)2’ 
where summation takes place over the sample values. Thus for fixed values of x we have— 
& (x — &£)* vary 
{2' (x — &)?}? 
___ 3 
~ 3 (¢ — £)* 
Thus, since the mean of the distribution of b, is 6,, we see that, for samples having the 
same x’s as those observed, 65, is normally distributed about mean f, with variance given 
by (22.57)—normally because it is a linear function of the y’s which are themselves normal. 
Consequently, 


b, . (22.56) 


var b; = 


~ 2 s 2a 


@ = bh) Vee Ct ee 
2 
is distributed normally about zero mean with unit variance. 
If o, were known this would provide a test of significance of 5, in the ordinary way ; 
but in fact o, is not known and the substitution of an estimate distributed in the Type III 
form brings in the ¢-distribution in the usual way. We take as our estimator of o, the 
function s, where 


1 
be Sy4)2 
8 - AC ye, ‘ 3 : . (22.59) 


amd Y’ represents the values “ predicted ’”’ by the regression line, that is, the values 
Y’=G9—b,(e—Z). . : ; : . (22.60) 
Thus s? is based on the sum of squares of residuals. We shall show presently that s? is 


distributed in the Type III form with n — 2 degrees of freedom independently of 6, — f,. 
It follows that 


is distributed as “ Student’s ” ¢ with » = n — 2. 

A given value 6, may be tested accordingly. But we notice that the inference is a 
conditional one, that is to say, we are considering the distribution of ¢ in a sub-population 
for which the z’s are the same as those actually observed. (Cf. 21.47.) 


EXACT SIGNIFICANCE TESTS IN NORMAL CASE 157 


22.20. To establish the foregoing result we have to show that 2 (y — Y’)?, the sum 
of squares of residuals about the observed regression line, is distributed in the Type III 
form with n — 2 degrees of freedom. This is a particular case of a general theorem we 
shall prove at the beginning of the next chapter, but we will sketch an ad hoc proof here 
for the sake of completeness. 

Since the population is normal, the deviations of y from the true regression line for 
fixed x’s, Y = B, + 6,(X — «), where f, is the parent mean of y, is normal with variance 
o2. Now 


—_ 


Zly — Y’)* qed ly — bo = ble e) }* 


> 


02 
Z{y — Bo — Bi (w — €) — (bo — Bo) ~ (61 — 8:1) (@ — &)}* 
The coefficients 5, me 5, were chosen so as to minimise this sum, and hence 


(@— 22 —1 2 {y—f —f.@—#}2— 26, — 8,2 —2 SB" s@ — a1. (22.61) 


Coos 2 Og 


st 
Sis 
ou 


x 
a k= bono) 


The first term is the sum of squares of n normal variates with zero mean and unit variance ; 
the second is also such a variate, for it is the square of the deviation of the mean of y about 
its true value divided by the variance o2/n ; and the third term is also such a variate, as 
shown above. 


—\ 1 
It does not follow immediately that : ae is distributed as the sum of squares of 
2 
nm — 2 normal variates in standard measure, for the constituent items might be correlated. 
Let us then find an orthogonal transformation to new variates &, . . . é, linearly related 


to the » normal variates y — B) — 6, (x — #). These also will be normally and inde- 
pendently distributed. In particular (remembering that our summations refer to the 
y’s and x’s, but the latter are constant for our distributions), take 


i= Ely — ho — B, (w —#)} 


mn Lr 
1 — a 
—— be et = gyal — Bo — Bs (@ ~ 2) }| 
her (by — Bi) VX (x — #)*. 
O2 
£, and &, are then normal variates in standard measure. Moreover they are orthogonal since 
1 xu—x 
Bh ayn” VEG — a 

=k Z (x — =) 
= 0). 


n 
Consequently our transformation exhibits the first term on the right in (22.61) as > & and 


j=1 


the second and third as £? and &. Thus the total is distributed as . £?, which is the 
j=3 
result required. 


158 REGRESSION 


We may compare the result of 18.17—in which we saw that the mean value of «? 
was n, whereas that of e? was n — p — I, one degree of freedom having been lost in the 
sum of squares of residuals for every constant estimated—and the approximate result of 
21.20 in which y? had to lose a degree for each constant fitted by maximum likelihood. 
Fundamentally all these results are different aspects of the same thing and rest on the fact 
that the variation of the sum of squares of normal variates in standard measure is spherically 
symmetric, so that a hyperplane in the sample space “ cuts” the distribution in a spheri- 
cally symmetric form of one lower degree of freedom. 


Extension to Curvilinear Regression 
22.21. The foregoing result can be extended without difficulty to the case when 
the regression is curvilinear. If 
Y= Ue Poa Ubi Una 
where the P’s are orthogonal, then 


var 6; = 


so that 
(6; — Bi) V2 P3 
D2 
is distributed normally with zero mean and unit variance. Taking as our estimate of of 
1 


7 uN 


we see, as before, that 
pC BYV mM A—G-VVEPRO 
V2 eae | 
is distributed as ‘“‘ Student’s ” ¢ with » = n — 7 — 1 degrees of freedom. 
It will be observed that in this and the previous section we have not assumed anything 
about the distribution in x-arrays. We have merely supposed that for any given z, y is 
normally distributed with constant variance. 


. (22.62) 


Example 22.6 

Consider again the soil data of Example 22.3. We found, for the cubic term in the 
parabola, a coefficient of — 0-9189 x 10~*. Is this significant ? 

Here b; — B; = — 0-9189 x 1076 for 7.== 3; 

V(n — 7 — 1) = V(16 — 4) = 3-464. 
We have already found 2 (y — Y’)? = U, namely 
| U = 0-050. 

We further require ¥ P} which has been obtained incidentally in the working of Example 
22.3 and is equal to 9-31525 x 10!°. Hence 
_ 0-9189 x 10-8 (3-464) 3-052 x 105 
- 0-2236 


= 4:3, 
This is highly significant. 


CASE OF VARIATE WITH EQUIDISTANT VALUES 159 


Case when the Independent Variate proceeds by Equal Steps 

22.22. An important special case arises when the independent variate has values 
which are equidistant, as, for instance, in most time-series and in grouped data. If we take 
the interval between successive values of x as our unit, the variate-values may, by a suit- 
able choice of origin, be taken as 0, 1, 2,...%-—41. The various moment-functions 
Hy, entering into the expressions for polynomials, etc., may be written down once for 
all. Furthermore, this case lends itself to simpler summatory methods of forming the 
actual polynomial values and the residuals. 


22.23. For a set of values 0, 1, 2,...%-—/41, we have 
Se ae 19) Coe n(n — mae — a 
2 ees 2 
sy a ele ete. 
ar J 
Thus— nS — 1), ee mae? ee 0, etc 
From (22.38) and similar equations we then find 
| en ee n z 1 
5 2 e ° e (22.63) 
P _ XP 2 — Xs — Hep, _ me —t 
; be = 12 


and so on. The polynomials may be obtained more systematically as follows :— 
We show first of all that 


p ° 
n—1\ A 
_P, =0, q=1,2,...p, «.  « (22.64 
d,( j )a ai , 
where AJ is the jth terminal difference of P,, and the 2’s range from 0 to n — 1. In fact, 
from Newton’s interpolation formula, 


PY Xl 
Py = Se ae, ~ 6 ew (22.65) 


and since the P’s are orthogonal, 


E (a +q —1)8- P, =0, q<p . .  « (22.66) 
Substituting from (22. 65), we find for the term in A’ P,— 
Ele +q— lett rs Py = Zi +gler —@ +9 —1) Bea P, 
Ai 
=(n+q — ljatd — : 5 
Oe Gaaaiiae 


Thus for all g from 1 to p we have 


p Aj 
0= ane URAL es Se oe 
py ta Wr eon 
= tas (" 7") Aj P z 
~ (w= 1)! gj Jitq™ %%? 
whence follows (22.64). We now find functions obeying these conditions. 


Consider 
y =C (x + py. ; ‘ j : . (22.67) 


160 REGRESSION 


This is a polynomial of degree p, and ifforz = 0,1, . . . pit assumes the values ¥, . . . Yp 
we have— 


ib .(— ])\p-3 
y (aia C otal Sie ate : : . (22.68) 


for this also is of degree p and has the right values at 7 =0,...p. Taking now 


=" O oe (1) ae : . . (22.69) 


1 
we find that for 7 = — q 


vil) ads 
y(—9) =O(p + ayeen (— ay S71 te 


p 
Zane 
=O (—1)"(p 7 gee ie ) are . (22:70) 
mm , 7a) 4 
Now from the definition of y this clearly vanishes for —~x =q =1, ... p, and thus 
(22.70) is zero. Comparing it with (22.64) we see that the conditions are satisfied if we 
give to y; the value of A’ of (22.69), ie. 


ees es 
BE er manga yy 
ee eT 


The constant C is evaluated by the fact that the coefficient of X? in P, is unity, giving 
A? P, =p! This gives 
OYE we 
~ (p)!(@m—p— Dl 
Finally, substituting in (22.65), we find 


. (22.72) 


Pe (i ee ee ee 


= (2p) 2491)" (Pp = 9)! (» —peat)} 
where by convention the term X"! is unity for 7 = 0. The first six polynomials are 
Poe ees n—1 
2 

, ne*—] 

Be eg 
3n* — 7 
P, =P} P, 
= 3n? — 13 3 (n? — 1) (n? — 9) 

PS fl Sa FeO . (22.74) 

5 5 (n? — 7) 15n* — 230n? +- 407 
PS BS eS i 

; a 1008 Py 
2 — = 

Pep = 5 (3n 31) po a 5n* — 110n? + 329 P2 


44 176 
__ 5 (mn? — 1) (n? — 9) (n? — 25) 
14784 


CASE OF VARIATE WITH EQUIDISTANT VALUES 161 


Four more values are given by Allan (1930), to whom the above derivation of (22.73) 
is due. 

Values of the polynomials up to and including the fifth are given in Fisher and Yates’ 
Statistical Tables up to n = 52. 


22.24. We can now find an explicit expression for X P?. Since the polynomials 
are orthogonal we have 


2P2 = 2 (pep) ee 
which, by the argument resulting in (22.64), leads to 


ee (x + p)! Ai 
Sees aera 


=U 


Putting g = p + 1 in (22.67) and (22.70), we have 


n/a Ai 
y (—q) = C (— 197! = (— 1)? (2p + 1)et « ; ) 7? 
» (eae 


whence, after a little rearrangement, 


(n+ p)! Ai P, as (p !)? (n + p)! CG: 
j!(~—j—1)!p+j+1 (2p 4+1)!(~—1)! ’ 


and thus, substituting for C from (22.72), we find 


a (p !)* 
SP = comes —1)... (n? — p?). - » (22075) 


22.25. It is also possible to express the orthogonal polynomials in terms of central 
differences. We quote without proof the results (for details of which see Allan, 1930) :— 


eee A (Sy 
a ie Te Le 
a ittam—1)} 
where [x] Tie ae ° . . ° (22.77) 


The series is summed from j = 0 until 27 > p, when the denominator vanishes and (p — 4)! 
is written for J’(p + 4) to preserve the factorial notation. In practice the polynomials 
for particular examples are not determined from (22.73) or (22.76) but by the use of tables, 
or by summation from differences in the manner of Example 22.9 below. 


Example 22.7 


For the fitting of a regression line in the case of equidistant intervals various methods 
are in use. A choice between them depends on the length of the series, the order of regres- 
sion to which it is desired to go, and the computing resources at the investigator’s disposal. 
We will illustrate two methods in this and the next example. 


A.S.—VOL. H. M 


162 REGRESSION 


TABLE W222 
Fitting of Regression Line by Orthogonal Polynomials—Equidistant x-intervals. 


(1) (2) .: See (4) (5) (6) 
: opulation 
Year Variate. (uillion). 

P,; ig Yaa ae tPs; T2P, 
1811 —6 10-16 22 — ji 99 
1821 — 5 12-00 Til 0 — 66 
1831 —4 13-90 2 6 — 96 
1841 —3 - 15-91 — 5 8 — 54 
1851 —2 17-93 — 10 7 ll 
1861 —l 20-07 — 13 4 64 
1871 0 22-71 — 14 0 84 
1881 1 25:97 — 13 — 4 64 
1891 2 29-00 — 10 = 7 ll 
1901 3 32-53 — 5 = — 54 
1911 4 36:07 2 — 6 — 96 
1921 5 37-89 11 0 — 66 
1931 6 39-95 22 Til 99 


In Table 22.2, column 3 shows the population of England and Wales (in millions) 
for the years shown in column 1. These are at ten-yearly intervals, and the variate-values 
in units of 10 with origin at the mid-point of the range are given in column (2). These 
are the values of P,. 

The corresponding values of P,, P; and P, are given in the last three columns. They 
may be calculated direct from (22.74), but are most conveniently taken direct from the 
Fisher-Yates tables. 

We find, for » = 13, 


DY Pia a7] 


SYP, = 123-19 
x YP, = — 39-38 x 6 = — 236-28 
= YP, = — 374-30 x 12 = — 641-657,143, 


and, direct from the tables, 


x P? = 182, 5 P? = 2002, J P? = 572 x 36, 
= P? = 68,068 x (42). 


Hence, from equations of the type b; = =) e find 
j 
6, = 2-608,626, b, = 0-061,533,467, 6, = — 0-011,474,359, 6b, = — 0-003,207,699 


and the quartic curve is 
Y — 24-1608 = 2-6086X + 0-061,53 (X2 — 14) — 0-011,47 (X83 — 25X) 


—~ 0-003,208 (x = ae Sea 144 ) . wg (22,78) 


We can now find the residuals for each term in this equation. We find 


& Y? = 8839-9389 
aY =31409; 


CASE OF VARIATE WITH EQUIDISTANT VALUES 163 


Hence the sum of squares of Y about the mean of Y, 


SY =i 5283. 
Thus we have :— 


Residual Sum of Squares. 


Original variation . . . . . . . tt; 1251-283 ; 500 

Contribution of first term = (hy CSW Nan | : 1238-497 12-786 
Contribution of second term = b, S(YP,). . 7-580 5-206 
Contribution of third term = 6, 5 (YP,) 2-711 2-495 
Contribution of fourth term = b, U(YP,) . 2-058 0-437 


For the variance of the residual elements we divide by the number of degrees of freedom 
(n —j — 1) and obtain 


| Residual Sum of Squares. | Divisor. Residual Variance. 
; | 

12-786 | 11 1-162 
! 5-206 | 10 | 0-521 

2°495 | 9 . 0:277 

0-437 | 8 0-055 


Fig. 22.2 shows the data graphically with the cubic and quartic of closest fit. 


Population (millions) 


10 


” 1621 1841 1861 1861-1901 ~~-*1921 
Years 
Fic. 22.2.—Cubiec (full line) and Quartic (broken line) Parabolas fitted to the Data of Table 22.2. 


The fit is evidently a good one, as is borne out by the smallness of the residual variance, 
but we must sound a warning as to the use of this polynomial. For interpolation in the 
variate range it would probably suit very well; but for extrapolation outside the range 
it is dangerous unless there is good reason to suppose that the polynomial has some theoretical 
basis (which is not so). It would, for instance, be most unsafe to try and estimate the 
population in 1960 by inserting X = 9 in equation ( 2.78). 


164 REGRESSION 


Example 22.8 

In Chapter 3 it was seen that factorial moments can be derived by summatory pro- 
cesses. A somewhat similar method can be used to fit orthogonal polynomials. We will 
illustrate it on the data of the previous example. 


TABLE 22.3 
Fitting of Orthogonal Polynomials by Factorial Sums. 


So Si S, Ss 
10-16 10-16 10-16 10-16 
12-00 22-16 32-32 42-48 
13-90 36-06 68-38 110-86 
15-91 51:97 120-35 231-21 
17-93 69-90 190-25 421-46 
20:07 89-97 280-22 701-68 
22-71 112-68 392-90 1094-58 
25-97 138-65 531-55 1626-13 
29-00 167:65 699-20 2325-33 
32°53 200-18 899:38 3224-71 
36-07 236-25 1135-63 4360-34 
37°89 274-14 1409-77 5770-11 
39-95 314-09 1723-86 7493-97 

314-09 1723-86 7493-97 


In Table 22.3 the column headed S, gives the value of Y. The next column, headed 
S;, gives the sums of the values in the first column proceeding from the top; and so for 
the columns headed 8S, and 83. 

Now construct the quantities 


do ee ee oa anaes 
n 13 
2! 2 (1723-86) 
= — 8, Sea ne 
a tas 182 Esher’ 
3! 6 (7493-97) 
Ge a een ae 
ein iy oe 2730 6 
the general formula being 
(7 =~ Lees; 
a= : ; ; . (22.79 
* nin+1). (n + 9) ( ) 


Then obtain the quantities 

a) = dy = 24-160,769 

a, = Qo — ay = 5°217,253 

Ay = d — 3a, + 2a, = 0-270,749, 
the general formula being 


a, =a, -2 P+), , YP) +)@+2) 


(1!)22 (2123 Ggy—..- «© (22.80) 


CASE OF VARIATE WITH EQUIDISTANT VALUES 165 


Finally put by = Q) = 24-160,769 
by = — 2 7% = aw = 2-608,626 
b, = Te ae =a a, = ee = 0-061,534, 
the general formula being . 
> = Oye oy AP =a ae meee 


Then the 6’s are the coefficients of the orthogonal polynomials in the regression equation. 
The values we have found check with those of the previous example and the reader may 
care to work out b,; and b, by the same method. 

This process is due to R. A. Fisher and avoids the direct, calculation of the values of 
the orthogonal polynomials. Its validity may be established by using equations (22.75) 
and (22.73), which give 


Te2yP, _ (2p !) (2p + 1)! 
> SPE (lim @t—l)... wt py? YP) 
= (2p+1)! Be) py) 1)! eee 
(p!)* @ 1). (n—ny gy !)* (p—7)! G1) (=p) et) 


The first part of the expression explains the coefficients in (22.81), the second part those 
in (22.80). The third part gives rise to (22.79) when it is remembered that the sums S 
are expressible as sums of factorials (cf. 3.10, vol. I, p. 58), but the summation takes place 
from the top of the column. 


Example 22.9 


As a rule it is unnecessary to evaluate the polynomial at all the points for which data 
are given; but if the values are desired for comparison with observation they may be 
obtained by summatory processes from the differences. 

The terminal differences themselves are obtainable simply from the quantities a, of 
the previous example. For a polynomial of the first degree we have 


AY = — z ay 
al «4 ee BRes, 
Vea, 3a, 
For that of the second degree, * 60 
. Sees 
ae. gs -(a, + 523) - (22568) 
Y =a) + 8a, + 5a. 
For the third degree, ; _ 840 . 
OT ae — Dw 3) 
APY = 7, (a + 7) 
(n — 1) (n — 2) : 5 ee ee) 
Angee eae eay + 140) 
(a 


Y =a, + 3a; + 5a, + 743. J 


166 REGRESSION 


The formulae for higher degrees are constructed on analogous lines, the multiplying 
factors for successive differences being given by 


(—ye@ + Dp +2)... Cr+) 


(n —1)(n — 2)... (n—>p) 
and the coefficients of the a’s by 
Y 1 3 5 4 9 11 


AY 1 5 14 30 55 
Aye ] 7 27 77 etc. 
At Y 1 ee) 44 
Aa y i De 
Ae ¥ 1 


We leave the proof of these results to the reader. 
For instance, for the data considered in the two previous examples we found, for the 
parabola of the second degree, 
Y = 24-160,8 + 2:608,6X -+ 0-061,533 (X?2 — 14) 
a, = 24:160,769 ; a, = 5-217,253; a, = 0-270,749. 
Hence, from (22.83), 
7 60 
~  —1)@—2) 
6 
Y= — 
a n—t1 
Y =a, + 3a, + 5a, = 41-166,273. 
We then build up the polynomial values as shown in Table 22.4. The second difference 
0-123,068 is shown at the foot of column (2). Being a constant, it could have been written 


A2?Y az, = 0-123,068 


(a; + 5a) = — 3-285,499 


TABLE 22.4 
Calculation of Polynomial Values from Differences. 

(1) (2) (3) (4) (5) (6) 

Number of Second - First Polynomial Observed Difference 
Term. Difference. Difference. Value. Value. (5)—(4) 

1 — 1-808,68 9-863 10-16 0-297 
On — 1-931,75 11-795 12-00 0-205 
3 — 2-054,82 13-849 13-90 0-051 
4 — 2-177,88 16-027 15-91 — 0-117 
5 — 2-300,95 18-328 17-93 — 0:398 
6 — 2-424,02 20-752 20:07 — 0-682 
rai — 2-547,09 23-299 22-71 — 0-589 
8 — 2-670,16 25-969 25:97 0-001 
9 — 2°793,23 28-763 29-00 0:237 
10 — 2-916,29 31-679 32-53 0-851 
ay — 3-039,36 34-718 36-07 1-352 
12 — 3-162,43 - 37-881 37-89 0-009 
13 0-123,068 — 3-285,499 41-166,27 39-95 — 1-216 


all the way up, but to do so is a waste of time (and in practice, of course, we should not 
devote a separate column to it). The first difference is shown at the foot of column (3), 


MULTIPLE CURVILINEAR REGRESSION 167 


and the figures above it constructed by adding the second difference at each stage. The 
polynomial values themselves are compiled by adding the first differences to the value 
at the foot of the column, 41-166,27. 

We have also shown the observed values and the difference between polynomial and 
observed values. The sum of squares of the latter is 5-204, agreeing within the margin 
of rounding-up error with the value for the sum of squares of residuals found in 
Example 22.7. 

As an exercise the reader should work out the polynomial values for the third- and 
fourth-order polynomials and compare the sum of squares of residuals with the values of 
Example 22.7. 


Multiple Curvilinear Regression 


22.26. We considered the linear regression of one variate on a number of others 
in Chapters 14 and 15. There now remains the extension of our results to the 
curvilinear case. 

The extension is very easy to carry out when we remember that in multiple linear 
regression there is no restriction on the degree of dependence among the “ independent ”’ 
variates. In particular, some of them may be functionally related, and more particularly 
still, one variate may be a powerof another. It is thus clear that the process of fitting 
curved regression lines can be regarded as formally equivalent to that of fitting linear 
regressions. For instance, the fitting of 

Y =a + @,X, + a, X_ +0, X3 + O,X, +0, Xs; 
is equivalent to 

Y=a,+4,X; +a, X} + a3 Z, +a,Z3 + a; Z}, 
the latter being a particular case of the former where X, is the square of X, (and their 
covariation accordingly complete) and similar relations exist between X,, X, and X;. 

The case of curvilinear regression for a single variate, which has occupied the fore- 
going part of the chapter, could then have been treated by the methods of Chapter 15. 
We have discussed it afresh only because it is more easily dealt with by direct methods. 


22.27. In multiple regression analysis it sometimes happens that, having worked out 
a regression equation, we wish either to take account of a new factor or to remove one 
which appears redundant. To avoid the necessity of solving a new set of determinantal 
equations the following device is useful :— 
Consider the case of three independent variates measured from their mean 
Y =6,X, + 6, X, + b; Xs. 5 ; : . (22.85) 
In accordance with our general method the constants b are given by 
b, Z (a7) +b, XY (a, %) + b, 2 (a, 2s) = 2 (x, y) 
b, 5 (7, 4%) + 6,2 (x) +06, 2 (2,2,) =< (a, | 
b, & (x, %3) + bz 2 (x, %s) + bs & (x3) =2 (ayy) 
Suppose now we replace the functions »' (vy) on the right by 1, 0, 0 and obtain the solutions 
b, =i, be =Ciz, bs = C13; and similarly for replacement by 0,1,0 and 0,0, 1, 


the solutions being written 
by = Ci, Cia, Crs 
bs = Cra, Cas, Cas ?- : : ; . . (22,87) 


bs = Cis, Cas, Css 


. (22.86) 


168 REGRESSION 


Then the solution of (21.86) is 
by = Cy 2 (41 Y) + Cin X (U2 Y) + Crs X (a5 y) 
by = Cra X (XY) + Cae X (X2y) + Cos X (way) . (22.88) 
bs = Cis X (x, y) + Cas 2 (a2 y) + Css 2' (x5 y) 

as isimmediately evident on substitution. The values of the c’s are those we have denoted 


earlier in the chapter by determinantal forms, e.g. cj, = A\?)/A®. 


22.28. Now suppose that we wish to discard the variate z;. From (22.86), with 
1, 0, 0 written on the right, we find 
Uys) ae 
(12) (23) =O 
(isis) 


. (22.89) 


wie i 


where (jk) stands for 2 (x; %,), and 

(11) (12) (13) 
(12) (22) (23) 
(13) (23) (33) 
There are similar expressions for the other c’s. If the values of the constants when 2, 
is removed are Cj, Cy9, Cg, we shall have 


A= . (22.90) 


, 1 : 1 
Cia yr a ; | ; C12 = 7 ve : etc. . (22.91) 
where 
F (11) (12) 
A = (12) (22) | “ : 6 « (22.92) 
Now we have 
| (11) (12) 1 (11) (12) 0 
(12) (22) 0 (12) (22) 1 
Cis Cox (13) (23) 0 (13) (23) 0 
Cs (11) (12) 0 
A (12) (22) 0 
(13) (23) 1 
(12) (22) (11) (12) 
(13) (23) | (13) (23) 
AA’ 
Thus 
C13 Cos Cio C33 — Crs Cos 
Ciz — = 
Cas C33 
(12) (23) (11) (12) (12) (22) (11) (12) 
(13) (33) (12) (22) (13) (23) (13) (23) 
pale 
_ a4 
AA’ 


re Ct, RY 


MULTIPLE CURVILINEAR REGRESSION 169 


Similarl 
. ; 
Ci eae a : : . 5 » (22,94) 
33 
je, = os (22.95) 
22 a 22 oe ° - ° e ° . 
C33 


This gives us the new c’s in terms of the old. Denoting similarly the new b’s by primes, 
we have 


b, a b, = (Ci, a €,,) PH (x, y) +p (Cy = Cy) 2 (x, y) ai Cis 2 (x5 y) 


1 
a {cfs X (try) + Cis Cos & (U2 Y) + Cis Cos & (23 y)} 
33 


Hence we have 


be C1s bs 

i: , i ee. (2288) 
b; = b, a Cas 3 

C33 


expressing the new constants in terms of the old and the known constants c. 
Finally, the contribution to the sum of squares due to the variate x, is 


b, Z (wv, y) + 6,2 (x.y) + 5,2 (wy) — b, X (wy) — b, X (x,y) 


= "8, E (ery) +2 ds E (way) + baE (way) 
33 


33 


b3 
= 3, S : : ; F § : - (22.97) 
C33 


22.29. Generally, if there are p independent variates the equations for the b’s are 


b,2 (v1) +6,.2 (412,)+...+6,2 (1,2) =z (y2,) 


b, 2 (#,2,) + 6,2 (@e%z) +... 46,2 (23) =Z(y-a,). 
If x, is omitted the equations become (p — 1) in number in variables b; . . . b,_,. Sub- 
tracting from these the first (p — 1) of the above equations we find (p — 1) equations, 
typified by 


(6, —b,) 2 (x, t;)+(b,—b,) & (0, 2;) +. 6 « + (bp-y—bp_1) & (p_1 4) Dy & (2; Lp) = 0 


(22.98) 
But these equations are the same as those for the coefficients c,, . . . Cyp With (b, — b,) 
in place of c,,, etc., and — 6, in place of c,,. Hence 
eh cn b, = Cip 
= vy Cop 
, Cin Ue 
By a= — ee SCG) 


Cop 


170 REGRESSION 


Similarly it will be found that 


74 
oe eee) 
1 1 
C 
pp a. Pee ei) 
C 
e = lp “2p 
Gy = Gy SS = 
Cpp 


with similar equations for the other c’s. 


22.30. Somewhat similar results apply when a variate is added. If primes again 
refer to new coefficients when x, is added, we have, as above— 


Caq 
; Gr 
es a 7 , ; , « (22.001) 
qq 
Cry — Oy = 1g F2g 
Cag 
In order to use these equations to adjust the constants we require c,, ... ¢,, and b,. 
By writing down the equations satisfied by c,, . . . ¢,, and subtracting the correspond- 
ing equations in cj, . . . Cig, we get p equations such as 
(Cy, — Cy) © (2, Qs) re oe (Cip — Cp) & (Xj; Ly) = — Cig (x; 2). 
These are the same as the equations in b, . . . b, with — c,,2 (x; x.) instead of 2 (x, y) 
on the right, and hence 
p 
Cp — ip = — Cig ees (apm): 
j=1 
Thus, using (22.101), 
Dp 
cpa Coy & 2. 
ae ad ; ‘4 Lge ip . e e (22. 102 
a 2, pj By Bi ) 
The last of the equations satisfied by c,, is 
Cig & (Bq %1) +. 2 + lpg S (Hy Lp) + Coq & (02) = 1. 
Substituting for c,,, etc., in terms of c,,, we get 
Dp 
cea E ~ DP cin E (205 Hg) E (ty x) =1. ..7. Sze 
ik=1 
This gives c’,,, and c’;,... Cp, are derivable from (22.102). The other constants then 


result from (22.101). 

Cochran (1938a), to whom this proof is due, says that the elimination of two variates 
is best carried out in two stages of one each; that where one variate is eliminated the 
method is quicker than re-solving the regression equations, except where there are only 
two independent variates in the first instance ; and that if two variates are being eliminated 
the method is quicker if the original number of independent variates is six or more. For 
the addition of variates the method is in all cases more expeditious than re-solving the 
regression equations, 


MULTIPLE CURVILINEAR REGRESSION ial 
Example 22.10 (Cochran, 1938a) 


In a study of the effect of weather factors on the number of noctuid moths per night 
caught in a light-trap, regressions were worked out on X, (minimum night temperature), 
X, (the maximum temperature of the previous day), X, (the average speed of the wind 
during the night), and X, (the amount of rain during the night). The dependent variate 
was log (1 + »), where n was the number of moths. 

It was subsequently decided to investigate the effect of cloudiness, measured on a 
conventional scale as the percentage of starlight obscured by clouds in a night sky camera. 
This is the new variate X;. 

The quantities c,, for the first four variates were :— 


XG AG ix xe 
D.&, + 0:105,423,56 — 0:041,946,20 — 0:096,067,09 — 0-:018,490,96 
Xa AEG + 0:086,038,69 + 0:033,172,71 + 0:012,903,58 
xX. oes owe + 0:572,652,01 + 0:008,116,62 
EX, ee ners mits + 0:062,275,32 
and the sums 2 (x; v5) were 
= (tyes) = — 4-867, & (x, %5) = + 0-206, & (%,X%5) = — 0:5446, 
» (u,%;) = — 5-42, » (x3) = 7-87. 


We then find from (22.103) 
c,, = + 0-210,133,14, 
and from (22.102) 


15 — 4 0-369,198,24 8 — — 0-133,872,86 8 — — 0-118,533,74 
C55 C55 C55 
45 + 0-249,298,91, 
C55 
so that the new c’s are given by (22.101) as 
x, X, Xs XxX, Xs 
DG 0:134,066,25 — 0-052,332,16 — 0-105,263,03 -++ 0-000,849,84 + 0-077,580,79 
AG Ane + 0:089,804,68 -+ 0:-036,507,20 -+ 0-005,890,52 — 0-028,131,12 
Xs Soc Wake + 0:575,604,43 -+ 0-001,907,12 — 0-024,907,87 
Xs Sra eis stele + 0:075,335,08 + 0-:052,385,96 
Re = fe ne ee + 0-210,133,14 
The original regression coefficients were 
6, = + 0-198,140,7 6, = + 0-038,528,4 5b, = — 0-508,649,2, 
b, = + 0-031,848,2. 
5 
We now find bs = Dd {es (ay, y) } 
j=1 
= — 0-227,149,6, 
and from (22.101) we then have 
b, = + 0-114,277,5 6; = + 0-068,937,6 6; = — 0-481,724,3, 


b, = — 0-024,779,9. 
As usual we have retained more figures than are necessary, in order to avoid cumulating 
errors and to facilitate the detection of computational slips. 


172 REGRESSION 


22.31. The constants c found in the foregoing method have a further use: they 
give the standard errors of the regression coefficients and provide some of the functions 
required in more exact tests based on the ¢-distribution. If, measuring y about the mean, 
we have 

Me = 6, X; SBT D.a + 66) 6 ae 0, Xs 
then there are p equations of the kind: 
x ("1 Y) = 6,224 + by 2 (%,%:) +... + by & (4 Lp), 
and thus, recalling the definition of the c’s, we have 
by = Cy 2 (4, Y) +e 2 (tay) +... Cie (en oe 
Thus, for fixed values of the x’s, 
var b; = vary i C45 C4, @; *) 
7k 
= Cy, vary, : : . : : . (22.104) 
and so for the other 6’s. 
For large samples var y may be taken to be the estimated variance 


1 
—— (y= Y )?. 
Passa (y ) 
If the sample is small and it is desired to make a more accurate test, then we have, 
by an extension of 22.21, that 


_ C= 8) Ve pp — )) 
t Ey Le ee 5100's): 


is distributed in ‘‘ Student’s ” form with » = » — p —1 degrees of freedom. 


22.32. As a final comment we may emphasise that regression equations are only 
polynomials fitted to the means of arrays, and consequently that if the scatter about 
those means is substantial they are not very reliable as estimators (though they may be 
better than other methods). The comment would hardly be necessary were it not for a 
tendency to use the equations somewhat uncritically for purposes of prediction. The 
point assumes even greater importance when attempts are made to estimate the dependent 
variate for values of the independent variates outside the range on which the regressions 
are based ; or again, if the observations are distributed over time so that the population 
may be changing while the sample is being drawn. ‘The technique of regression analysis 
is undoubtedly useful in many fields, but—as with many other statistical techniques— 
the careful investigator will apply it with a certain amount of self-discipline. 


NOTES AND REFERENCES 


The theory of curvilinear regression was studied by Karl Pearson (1905). Orthogonal 
polynomials had been considered, and the essential problems solved, by Tchebycheff as 
far back as 1857, but their use in statistics was not fully appreciated until about sixty years 
later. Pearson gave in 1921 the general formulae for fitting curved regression lines up to 
the fourth order. Neyman (1926) pointed out the elegance of the determinantal approach. 

From about 1920 onwards there may be discerned two main lines of development. 
The Scandinavian school, led by Wicksell, has developed the analytical theory of regression 
—see Wicksell (19176, 1933, 19346) and a useful memoir by W. Andersson (1932). The 


EXERCISES - 173 


second line, followed by Fisher, Aitken and others, has been concerned with the fitting of 
regression curves to arithmetical data and exact significance tests—see Fisher’s papers of 
19216, 1922b, 1924b, 1926a, a paper by Allan (1930), and three papers by Aitken (1933a, 
b, c). The literature on orthogonal polynomials is now very large. 

For some illustrative material, see K. Pearson (1905), Andersson (1932), and Pretorius 
(1930). See also references to Chapters 14 and 15. 


EXERCISES 


22.1. Show that the regression of y on the variance of x (the scedastic curve) is 
given by 


= tga oe Ay Dig (X) ey : A oe D* g (X) Di-*g (X)]? 
ea a oe) » a DAG) wee ge ge 


s=0 
where (47) (2)! 
j! j! ; 
(Wicksell, 19340.) - 


22.2. Show that if the regression of y on the mean of z is linear, then from (22.11) 


is a linear function of ¢ (¢,) and 2 ¢ (t,). Hence that 
ul 


Kj, Kao = Ki1 Kj+1,0 


(Wicksell, 19340.) 


22.3. Show that if the marginal distribution of a bivariate distribution is of the 
Gram-Charlier Type A: 
f=a(z){l +a,;H,+a,H,+...} 


the regression of y on ~ is 


90 ie?) 


a” 2, ay, H; 41 (X) 


4 a j7=0 k=0 
1+ 2) a; H; (X) 
TSE 
(Wicksell, 1917b.) 


22.4. Transforming the orthogonal polynomials of (22.74) to a new variate 


ey a3 note that P, — §P,_, is a numerical multiple of P,_., say APy-2. Show 
that 
A = male 
PI ina 


and deduce the recurrence relation, 
= ee) ee a 1)*} 
ae seers) 
(Allan, 1930. The relation is due to Tchebycheff.) 


174 REGRESSION 
22.5. A regression line 
Y =a, +a,% +4, 4° pe, x8 a xe 


is fitted to normal data and the number of observations N is large. 


se 
between the variates and ¢ = (the moments referring to the x-variate), show that 
2 . 


If r is the correlation 


var d) = thst (45 + 30c? — 8c® + c4) (1 — r?) 


24N 
var a, = = *Y (15 + 30 — 15c2 + 4c) (1 — r?) 
6N uz 
_ Vale i — 
VOR Oy = ele 3c + 3c?) (1 — r?) 
var y 
= 1 4c) (1 — r? 
var a; = ane r°) 
var y 
a ar oa r*): 


(Andersson, 1932.) 


22.6. In the notation of 22.31 show that 
cov (b, bz) = Cy, Var y 
and hence show how to test the difference of two coefficients in a regression equation. 


22.7. Show how to derive a test of the significance of the difference of corresponding 
regression coefficients in two equations derived from independent samples, based on the 


result of 21.26. 


CHAPTER 23 
THE ANALYSIS OF VARIANCE—(1) 


23.1. At various points in this book we have encountered in different guises the 
result that the sum of squares of a set of observations about their mean can be represented 
as the sum of two independent sums of squares, each of which provides an estimate of 
the parent variance; and that their ratio provides a test of homogeneity, at least when 
the parent is normal. We now proceed to study in more detail a method of statistical 
analysis with considerable generality which springs from this result. In view of the com- 
plexity of the general case we shall begin by considering simpler cases under somewhat 
restrictive conditions and shall extend our results stage by stage. 


One-way Classification 
23.2. Suppose we have a set of variate-values divided into p families : 


tay Prise iG. eae 
Lis SES. Se 5 are es 
Zin Soe - Upp: 


Denoting by @ the mean of the whole set and by #; the mean of the values in the jth family, 
we have the identity 


D) (ty — 8)? = D/ (ay — & + & — @) 


tg 4,5 
= DS) eis —a) + DG —#% 2. (23.1) 


since the cross-product term 2)! (x;; — %;) (; — Z) vanishes. We may also write this as 
1,5 
», (tj; — #)? = 2B, geeky) = > N,; (&; — Z)?, é « (23:2) 
a9 1,9 j 
where n,; is the number of members in the jth family. 
It will also be convenient, from the point of view of a later generalisation, to write 
the mean of the jth family as x, and that of the whole as x, the periods in the subscripts 
showing which factor is being averaged. We have then the alternative form 


> (tj — «,,)? = ye (wj5 — ©5)? + > M(%,—%)* . - (23.3) 


i] iJ 

23.3. The problem we shall discuss in connection with families of values of this type 
takes some such form as the following: the members of each family are randomly chosen 
from some parent population corresponding to that family. The populations themselves 
are, as a rule, defined by some prior system of classification given among the data of the 
problem, e.g. they might be different varieties of wheat, the x's being the yields of the 
varieties grown under similar conditions, or they might be defined by income levels and 
the z’s the expenditure on food of a sample chosen from the different income groups. We 


now ask: is there any evidence that the factor measured by « varies significantly from 
175 


176 THE ANALYSIS OF VARIANCE 


family to family ? Alternatively, can the data be regarded as homogeneous, i.e. as emana- 
ting from populations which are identical so far as concerns the factor measured by x ? 
Further, when the question of significance is decided, how can we estimate the variation 
of x in families or groups of families, and how can we estimate the magnitude of any 
differences which exist ? 


23.4. We will assume, until further notice, that within each family the variation 
is normal with variance v, and that v is the same for each family. In later sections we 
shall endeavour to remove these rather restrictive conditions. On our present hypothesis 
the populations corresponding to the different families can differ, if at all, only in their 
means, and our first question is whether the sample values afford any evidence of such 
differences. 

Let us take as our hypothesis that the parent populations have a common mean m. 
Then we recall the following facts :— 


(1) The sum 235 (v,,; —,,)? is distributed in the Type III form of y? with 


N —1=2'(n;) — 1 degrees of freedom, that is to say as the sum of squares of N — 1 


J 
independent normal variates with zero mean and unit variance. 


(2) In any given family x; ri “t is distributed normally with unit variance about 


mean m, and is independent of the sum = (x,; — x,;)? which is itself distributed as y? 
i 


with ; — 1 degrees of freedom. 
Since on our hypothesis the observations may be regarded as a single sample from 
the same population, it follows that 


i ° . . Fj 
Dy i —2,,)* is distributed as y? with N — 1 df. | 
ie) 
1 
pales < 3)" ” ” 2 (n,; —Il)= N —Pp df. (23.4) 
“Ey RF tee 9 - pled 
} 


The only statement requiring any proof is the last. It may be proved directly (see Exercise 
23.1), but we shall deduce it as the corollary of a general theorem due to R. A. Fisher which 
will often be required in this chapter. 


23.5. Suppose we have q variates x, . . . x, which are independently and normally 
distributed with unit variance about the same mean, which we may assume to be 
zero. Put 


q 
Et dees r=l...g 49,005 (eee 
s=1 
If we choose the coefficients 2 so that 
& ys Aig = 1 r=t 
aes oa ~~ 6 ae 


then each ¢ is distributed normally with unit variance independently of the others. There 


ONE-WAY CLASSIFICATION 177 


are q® coefficients 4, and the equations (23.6) impose 4q (q¢ + 1) conditions on them, so that 
the d’s can always be found in a multiplicity of ways. In effect they correspond to the 
rotation of orthogonal co-ordinate axes in a g-dimensional space. 

Now suppose that we have h linear functions of the 2’s, 6,...,(h <q) whose 
coefficients obey the orthogonality relations (23.6). These h variates are then distributed 
independently, normally and with unit variance. 

It is now possible to find q — / further variates ¢,,, ...¢, which are orthogonal 
among themselves and to ¢, . . . ¢,. Geometrically this is evident from the possibilities 
of rotations in the g-way space. Algebraically it follows from the consideration that if 
qh of the j’s in (23.6) are known, q (q — h) are unknown, and the number of conditions 
they must obey is 


29(q +1) —-fR(A +1) =3(9-h)(Q+h+)), 
so that values of the unknowns can be found in at least one way if 
$(q+h+1) <q 
or h+1 <q. 

Now suppose we express a sum of squares of g normal variates with unit variance, 
say A, as the sum of two quantities B and C; and suppose that B is distributed as the 
sum of squares of h independent normal variates with unit variance which are linear 
functions of the variates entering into A. Then we can find gq —h such variates inde- 


pendent of the first h, and C must be their sum of squares. Further, the distributions 
of B and C are independent. By an extension of the same argument, if 


A=A,+A,+...+A,, : : P Mecei) 
A is distributed as y? with » degrees of freedom, A, with »,,.., A,_, with »_,; and 


if the variates entering into A, . . . A,_, are mutually independent and are linear functions 
of those entering into A, then A, is distributed as y* with v, degrees of freedom, where 


=%Yy+t+r+... +H fs Z , . (23.8) 
and A, is independent of Ai, ... A,_1.- 


23.6. As an extension and kind of converse of this theorem we have the result, due 
to Cochran, that if A, . . . A; are distributed as vy? with 7, . . . » degrees of freedom, 
and their sum A is distributed as y? with v = 2'(y;) degrees, then 4A, . .. A, are inde- 
pendent. We will prove this for the case k = 2, the more general result following in a 
similar way. 

If the characteristic function of A, and A, is ¢ (t, ¢,), we have, by hypothesis, 


1 
$ (hy, 0) = Faye 
d (0, t2) = (1 — 2it,)P 
1 
and ¢ (t, t) = (1 — 2i8)het) 
ene AGO) Sb (NO eee 


(1 — Qit)tests)? 


and thus ¢ (é, 0) and ¢ (0, ¢) are both divisible by a factor in (1 — 2it)”? and no other 
A.S.—VOL. Il. N 


178 THE ANALYSIS OF VARIANCE 


factor in t because of the symmetry of ¢ (¢,, ¢.). These factors are identified by ¢ (4, 0) 
and ¢ (0, é,) as (1 — 2it)-* and (1 — 2it)"*, and hence 


$ (é1, t,) aa d (1, 0) p (0, t2), 


or A, and A, are independent. 


‘ M ; 
23.7. Let us now return to the statements in (23.4). The sum r x (xj; — 2,,) is 


1 
distributed as y? with »= N—1. The sum pa (x,; — 2)? is so distributed with 


= N —p. Further, the quantities x,; — x; may be transformed to N — p independent 
normal variates which are linear functions of the variates entering =~ the first sum. It 


follows from 23.5 that because of the identity (23.3) the third sum ~ abs 0; (e; —@,_)* is 


distributed as y? with », = (N — 1) — (N — p) = p — 1 degrees oa Dm and that 
independently of the second sum. 
Thus we may exhibit our break-up of the total sum in the following form :— 


TABLE 23.1 
Form of Analysis of Variance for One-way Classification. 


Sum of Squares. d.f. Quotient. 


Of family means about the mean of aa, Ej (ej — 2.) po Din eee 
e ° . . ° . ° j ‘ iy 


Of individuals in families about the ) fond — GR AG Gs _ 1 > ve ee 
respective family mean . a _— a ‘) Po AN Sp (ei ‘i 


Of individuals about th of the) } - 
hls an z ss i a ; = s (wij — x,,)? IN = Il — =i uz (wij — w,,)? 


We note that the sums of squares and the degrees of freedom in the first two rows sum to 
those in the third row (though the quantities in the quotient column are not additive). 
This is the origin of the expression ‘‘ analysis of variance,’ though, to be accurate, it is the 
sum of squares of the total which is analysed. 

To avoid cumbrous phrases we refer to the sum of squares of family means about 
the mean of the whole as the sum of squares “‘ between families,’ and to that of individuals 
about the respective family-means (for the time being) as “ residual.” We shall also speak 
of total sum of squares and total mean with the obvious significance, and denote degrees 
of freedom by the initial letters “ d.f.” * 


23.8. Since the mean value of 7? with » degrees of freedom is », the quotients in 
* The need has been felt for a word to denote ‘“‘sum of squares about the mean’’. Professor 
Pitman has suggested the word “‘ squariance ’’, though he seems to feel that this leaves something to 
be desired. In my own notes I use the word “ deviance ”’ but have not ventured to introduce it into 
the text. 


ONE-WAY CLASSIFICATION 179 


(23.1) are all unbiassed estimators of v, the parent variance. Only the first two, however, 
are independent. We recall that the ratio 


. (23.9) 


is distributed in Fisher’s form, which is independent of the variance v. This distribution 
accordingly provides a convenient test of significance in the normal case. 


Example 23.1 


Let us consider the application of the foregoing theory to a simple example which 
has been chosen to reduce the arithmetic to a small amount. The following shows the 
lives in hours of four batches of electric lamps :— 


Batch 1: 1600, 1610, 1650, 1680, 1700, 1720, 1800. 
Batch 2: 1580, 1640, 1640, 1700, 1750. 

Batch 3: 1460, 1550, 1600, 1620, 1640, 1660, 1740, 1820. 
Batch 4: 1510, 1520, 1530, 1570, 1600, 1680. 


We know that the batches were made frum four different specimens of wire, but were other- 
wise made under identical conditions. (This, of course, over-simplifies the problem as it 
is encountered in practice, but will serve for purposes of illustration.) The question is, 
do the batches differ among themselves in length of life? Ifso, we suspect that the quality 
of wire is varying materially, and if the lamps are to be standardised as far as possible the 
quality of wire must be made more uniform from batch to batch before manufacture is 
undertaken. The numbers in this example are small, but not much smaller than would 
be desirable in practice, owing to the expense and time involved in testing a lamp by running 
it until it burns out. 
The sums of 2 and x? for the four batches will be found to be— 


Number in Sample. (0) 2 (a) 
Batch 1 a 11,760 19,785,400 
sm 5 8,310 13,828,100 
= 3 8 13,090 21,503,700 
Te aks 6 9,410 14,778,700 
AOA es 26 42,570 69,895,900 


Thus for the mean life of lamp in the four batches we have 11,760/7 = 1680; 
8310/5 = 1662; 13,090/8 = 1636-25; 9410/6 = 1568-33. These certainly differ, but is 
the variation such as cannot have arisen by mere sampling fluctuations ! 
We find 
@.. = 42,570/26 = 1637-3077. 
Thus 
2 (#3 — 2, ,)? = Z xi, — Na? 
69,895,900 — 69,700,189 
195,711; 


180 THE ANALYSIS OF VARIANCE 


We also have 


pe (ey — a)? = 2 (nj 24) 05 — Na? 
= 44,360. 
The analysis then takes the form— 
Sum of Squares. d.f. Quotient. 
Between batches. . . . .- 44,360 3 14,787 
Beso 5 4 8 oo S o F¢£ 151,351 22 6,880 
CMON, A on oo oo Cc 195,711 25 7,828 
We have 
14,787 
z = tlog, ——_. = 0-383 
#08: “6880 


Wh == 3. Po == 2, 
The 5-per-cent. point for these degrees of freedom is seen from the tables to be 0-5574. 
The observed value is therefore not significant, and we conclude that, so far as this test is 
concerned, there is nothing to throw doubt on the homogeneity of the group. 

Having decided, provisionally at least, to accept the hypothesis that the data are 
homogeneous, we may ask, what is the best estimate of the parent variance ? Our analysis 
has given three different estimates, viz. 14,787, 6880 and 7838. It seems natural to use 
the last, which depends on the greatest number of degrees of freedom. 

With this value we find for the variance of the mean of samples of n, 


7828 88-48 


n /n 
The greatest difference of means observed is that between the first and fourth batch, 
1680 — 1568-33 = 111-67. The standard error of this difference is 
88:48 +/ (2 +4) = 49-2. 

The observed difference is rather more than twice the standard error, but we cannot con- 
clude that it is significant on that account. In fact, we have picked out the greatest differ- 
ence for examination from the six possible comparisons of pairs, and the distribution of 
the greatest difference must have a larger standard error than that of a difference chosen 
at random, which is what we have found. Nevertheless the fact that even the greatest 
difference is only slightly in excess of twice the standard error affords some general evidence 
in support of the hypothesis of homogeneity. 

We may also note that if a more accurate test of the difference of two means is required 
the t-test may be invoked ; but here also we must remember that we are testing the greatest 
of a set of differences. Where there are only two families concerned, the analysis of variance 
reduces to the t-test for the difference of sample means when variances of the parents are 
assumed equal. 


23.9. Suppose now that in the case of one classification we have applied a test by 
means of the analysis of variance and have found that the hypothesis of homogeneity is 


TWO-WAY CLASSIFICATION 181 


unacceptable, or, in plain English, that the parents do differ. Let us then consider the 
alternative that the populations are still normal and that they differ in their means but 
not in their variances. 

At first sight this may seem a highly artificial assumption to make, for if the popula- 
tions differ in their means it is not unlikely that they may differ in other respects. This 
is undoubtedly so, but if there is serious possibility of difference in variances their homo- 
geneity may be discussed separately by means of tests we shall consider in Chapter 26. 
Apart from this, there often arise in practice situations in which approximate equality of 
variance is plausible on prior grounds. For instance, we may be testing the effect of 
manuring on cereal yields, and it is reasonable to suppose that if the manure exerts any 
effect at all it will increase all plants of the same variety to about the same extent—that 
it will, in fact, displace the location of the distribution of yields without affecting 
its dispersion. 


23.10. The question we have now to consider is whether we can make an estimate 
of the common variance of the populations. A little thought will show that we can. The 
reasoning which led to the conclusion that the residual sum of squares is distributed as 
vy? with N — p degrees of freedom remains unchanged, so that the residual quotient in 
Table 23.1 continues to provide an estimator of v. The other two no longer do so. Con- 
sider, in fact, the sum of squares between families, and let the mean of the jth family be 
m;. Then we have 


ori (v4 — 2%)? =H ee TUN Fs Sg (Be a) chet ga 
= BE {%,—m,—(%,, —m,,)}%+ 0, (m;—m,,)% (23.10) 
j 
Here m,, is the mean Voy m.; and hence x; —m,; has the mean x,,—m,. Thus 
J 
Ln, {x37 —m; — (x,, — m__)}? is distributed as »y* with p — 1 degrees of freedom and 


EXn,; (uz; —2,,)? =(p —1)0 + Xn; (ms; — m,,)?. : « (23.1)) 


Not unless mj; = pines is, all populations have the same mean—does the expression 
on the right reduce to (p — 1) v, and hence the quotient between families give an unbiassed 
estimator of v. In other cases it is greater. 

Similarly, 


de (xj; — x,,) ee (Zn ne ee 3 —m.,,)? 


= (N 1e + En (m5 — Le . . (23.12) 


The expectation of the difference of tie two terms considered in (23.11) and (23.12) con- 
firms that the residual sum of squares provides an estimator of (N — p) v 


23.11. A comparison of the formulae we have already reached and those of section 
14.31 will show that the study of intra-class correlation is very closely related to the analysis 
of variance. It is an interesting exercise to derive the z-test directly from the sampling 
distribution of intra-class r given in equation (14.110) (vol. I, p. 362) and vice-versa. 


Two-way Classification 
23.12. We proceed to the case when the variate-values belong not to one of a single 
set of families but to two, say A and B. In the first instance we shall consider the situation 


182 THE ANALYSIS OF VARIANCE 


when there is only a single value in the jth class of A and the kth class of B. Our sample 
may then be set out in the tabular form: 


Crass B 
B, B, B, Torats 

A, an Vo 13 gr, 

A, ©o1 Loo X23 qx2, 

A, X31 X32 Xs gx3, « «» (2a13) 

Cuass A 

Ap pt Tp? Xp3 QXp. 

ToTAaALs | 92,1 PX .2 PX.3 : é 6 0qx,, 


This is not a contingency table. The numbers 2;, are variate-values, not frequencies. 
As usual, 2; signifies the mean of values in the class A; and x, the mean of values in the 
class B,, v,, being the mean of the whole. 

We have the algebraic identity 


on (aj, — %,,)? = D>. Ge — %, — 8 PO, + ee ey a) 
jk ik 


= (ty, — %, —@y, + %,,)* Jay (Oe ad ese oa (v, — 2,,)3 
ik 7k ik . 
a DS (Xin — 0, — Uy +e)? +92 (tj, —x%,)? + pk (x, —x,,)® (23,14) 
the cross-product terms vanishing on summation in the usual way. 


23.13. We are interested in the variation of the 2’s according to class membership. 
Let us take as our hypothesis that the pg values are homogeneous, that is to say that they 
all emanate from (normal) populations with the same mean m and variance v. In such 
a case class-membership exerts no influence on variate-values, and the observed differences 
are pure sampling effects. 

The expression on the left in (23.14) is then distributed as vy? with pg — 1 degrees 
offreedom. The mean 2,, is distributed normally with variance v/q and thus Y q (%;, — x,,)? 


is distributed as vy? with p—1 df. Similarly, 2p (x, — x,_)? is so distributed with 
k 
q— 1d. Finally the remaining term on the right is distributed as vy? with (p — 1) (¢ — 1) 


d.f.; for each term is normal with variance eon ) v, since 
I 1 1 
Oye = t,o n(2 ~Fiaeer +=) — aq (7- =) 
q P pq l q pq 


leet 1 
2 ia I 
(5 =) ae dra a 


TWO-WAY CLASSIFICATION 183 


so that the sum of squares of coefficients on the right is 


(e=BE=OY alta) tne +e 


ati ees) 
Pq 
Thus, since there are p + gq — 1 linear relations connecting the pq quantities 
Uj_ — Tj, — Ly, +H, 
their sum of squares is distributed as vy? with pg — (p + q — 1) = (p — 1) (q — 1) degrees 


of freedcm, which checks against the mean value of the individual square given by (23.15). 
We may thus analyse the variance in the following way :— 


TABLE 23.2 


Form of Analysis of Variance for Two-way Classification with One Member in each Subclass 


| 
Sums of Squares. d.f. Quotient. 


Between A-classes q2 (xj, — x,,)? p—l 1 & (xj, cool Die 
j me) 

Between B-classes pz (xn — x,.)* q—1 Feat = (x.% — x,.)? 
k = 


1 
Residual . . . Dd, (ie — 2, — «b+ %,,)°|(p — 1) (¢ — 1) (p — 1) (¢ — 1) 
Bae 


y (xjk — tj, — w.~ + @,,)? 
7K 


ToTais. . | > (je — @,.)* pg —1 
j,k 


The sums of squares and degrees of freedom (but not the quotients) are additive as 
before. It follows from the theorem of 23.6 that the three constituent sums are inde- 
pendent. Each quotient provides an unbiassed estimator of v. 


23.14. Our use of these results proceeds by an easy generalisation of the method 
exemplified in Example 23.1. We take as our hypothesis the supposition that all samples 
are from normal populations with identical mean and variance. Comparison of the esti- 
mates in the quotient column then provides a test of significance. If the hypothesis is 
rejected we may examine the alternative that means are different but variances identical 
throughout, in which case we shall find that the residual still provides an estimate of the 
variance, provided that an important additional assumption is made. 


Example 23.2 


The following data (Daniels, Supp. J.R.S.S., 1938, 5, 89) show the weight in grams 
of 95-yard lengths of wool thread from 100 “ends ’”’ being spun on four bobbins, 25 ends 


184 THE ANALYSIS OF VARIANCE 


to the bobbin. We are interested in two factors, the variation between bobbins and the 
variation in the 25 ends on the same bobbin, according to their position. 


TABLE 23.3 
Weight in Grams of 100 95-yard Lengths of Wool Thread spun on Four Bobbins. 


Bobbin Number. 
End Number. TOTALS. 
i! 2 3 
1 7-50 7-23 7-50 29-76 
2 7-52 7-81 7-77 31-15 
3 7-70 7-94 7-83 31-63 
4 7-93 7:94 7:96 31-59 
5 7-78 7:89 8-02 31-54 
6 7-73 8-23 7-99 32-09 
7 8:07 8:27 8-25 32-85 
8 8-01 8-54 8-24 33-33 
9 8-22 8:24 8:37 32-93 
10 8-24 8-35 8-43 33-17 
ll 8:17 8:29 8-46 33°30 
12 8-09 8-54 8:33 33°43 
13 8-11 8°45 8-27 33-21 
14 7-96 8-43 8-24 33:23 
15 8-09 8-47 8-12 33:13 
16 8:04 8-33 8-14 32:94 
17 7:78 8-47 8-19 33-01 
18 8-11 8-63 8-36 33°48 
19 8-17 8-31 8:31 32-95 
20 8-12 8-31 8-47 33°31 
21 8-13 8-10 8-19 32-69 
22 8-01 8-01 8-37 32°30 
23 8:17 7-92 8:27 32:44 
24 8:05 8-27 8-07 32:55 
25 7-91 7-92 8:28 32-63 
TOTALS ‘ 199-61 204-89 204-43 814:69 | 


It simplifies the arithmetic if we take a working mean at 8-00. The total sum of 
squares about this mean is then found to be 
Pa (a5,)? = 9-3829, 
and we have also 
& (2X5) = 14:69. 
Hence 
(x;y — a,,)? = 9-3829 — (0-1469) (14-69) 
= 7:224,939. 
The means of the four bobbins are 
7:9844, 8-1956, 8-1772, 8-2304. 
With the same working mean we find for the sum of squares 
E (x,,)? = 0-122,986,72 ; 


TWO-WAY CLASSIFICATION 185 


and hence 
pd (x., —x,.)? = 25 (0-122,986,72) — (0-1469) (14:69) 
=O: 916,407. 

The means of the four ends of corresponding position on the four bobbins can, of 
course, be found from the totals in the last column of the table, but it is simpler to find 
2 (qx;, — qu,.)? and then divide by q?. We find 

_ 4 (27-1881) 
=e 
= 4-637,814. 
The continual appearance of the factor (0-1469) (14:69) = Na? is to be noted. The 
quantity is best computed once for all at the outset. 


The residual sum of squares is then obtainable by subtraction, and we have the 
following analysis :— 


E (a; — 2,.)? (0-1469) (14-69) 


TABLE 23.4 
Analysis of Variance for the Data of Table 23.3. 


I< i=. SS 
Sums of Squares. d.f. Quotient. 
Between bobbins . . . .| — 0-916,707 3 0-3056 
Between ends .... . 4-637,814 24 0-1932 
vesidtial. a sos) le Bs ee 1-670,418 72 0-0232 
ANGE = 5 & oc | 7-224,939 99 0-0730 
| 


The variation between bobbins and that between ends are both significant—the ratio 
of the corresponding quotients to the residual quotient is so big in each case as hardly to 
require the z-test. We are led to suspect that the variation between bobbins, small as it 
is, cannot be a chance effect, and it looks as if bobbin number | is not getting its fair share 
of thread. Similarly, the weight of thread seems to be dependent on whereabouts the 
thread is spun on the bobbins, and an inspection of the original data suggests a systematic 
variation as we proceed along the bobbin from end number | to end number 25, with a 
possible maximum in the middle. If the manufacturing process is to be standardised as 
much as possible, we should have to examine the reasons for the shortage of weight on 
the first bobbin and for this systematic effect of position on the bobbin. 


23.15. Suppose now that, as in the example just given, the hypothesis of homo- 
geneity is rejected. What interpretation can we put on the residual quotient? Let us 
assume that each observation comes from a normal population with variance v, but that 
the parent mean of the subclass A, B;,, is m,,, these quantities varying from one subclass 
to another. Is the residual quotient an unbiassed estimator of v? In general the answer 
is ‘‘no’”’, but there is an important class of case in which it is affirmative. 

Let m,, be the mean of the q values of m;, in the class A,, m,, that of the p values 
in B,, and m,, the mean of the whole set of m’s. Then we may write 

. oy a) : : oe . (23.16) 
my =m, +&,, ect. . »« - «  « (23.17) 


186 THE ANALYSIS OF VARIANCE 


Then 
EX (t_,—2;,—@ +2, ,)PPHH 2X (my,—M;, —M,+-M, +5, —F;, —E , +é,,)? 

=E 2 (mi, —m;, —M, +m, PE 2 (Ey, —§,—Ex+€,,)%, . (23.18) 
the product term vanishing as usual. The second term on the right is equal to 
(p — 1) (q — 1)», for the &s are distributed with variance v about zero mean, so that the 
term in question is the residual sum of squares in a p X q two-way classification of a homo- 
geneous sample and hence has the stated expectation. Thus we have 


EB (xy, — tj, — Ey, + #,)? = 2 (mMj_ — my, — M~ + m,,)? + (p —1)(q —1)v. (23.19) 
The residual quotient will then provide an unbiassed estimator of v if and only if 
Myr = ™;, —™m sy, + nm = 0. . J . ° (23.20) 


23.16. Now suppose that x; is made up of three parts which are additive, viz. 

(1) the effect of the class A,, say a, ; 

(2) the effect of the class B,, say b,; and 

(3) a residual £,, which is normal and has zero mean. 
This kind of hypothesis will recur frequently. It amounts to an assumption that there 
is in x, an element a; which affects alike all members of the class A; but varies from one 
A-class to another ; an element 6, which similarly affects alike all members of B, but varies 
from B-class to B-class; and a third component representing random variation which, 


apart from the sampling factor, is the same for all subclasses A; B,. We then have 


id ie tl ol) 
and 


» (23.22) 


= 

I 
& 

ue 
i 


where, as usual, the subscript periods in the a’s and b’s denote averaging. Thus 


Mi, — m;, —M, +m, =a, + b, — (a, + 6.) — (a, +b) +a, + 5, 
=i 


so that (23.20) is satisfied and the residual quotient is an unbiassed estimator of the 
variance v. 
Under the same conditions it will be found that 


gE E(x, —2,.)*=(p — lv + gE (mj, —m,) 
=(p—1l)o+ qo (a; —a.)? : : ' . (23.23) 
pEE(e,—2,)%=(—Votpl(—b)®  .  .  . . (88.24) 
EE (xq, — #,,.)? = (pq —1)0 ay, —a, +b, —6,)2 
=(pq—J)ot+ 12 (ay — G1 PS (p= 6) (22.25) 


23.17. We have supposed that the component ¢ had a zero mean, but of course if 
all these components had the same mean, the constant common to them could be absorbed 


THREE-WAY CLASSIFICATION 187 


into the functions a, and b,. Our hypothesis is thus a little more general than it appears. 
In certain practical cases it is a plausible hypothesis to make. For instance, in Example 
23.2 it is reasonable to suppose that the effect of a particular bobbin is the same for all 
ends, and the effect of situation the same for all bobbins. If there is any serious doubt 
on the point we have to collect further data and consider interactions in the manner 
described later (see 23.22). 

It may, however, be noted that if the variation of the m,,’s is comparatively small 
the appearance of the term containing them in (23.19) does not materially vitiate an estimate 
of v from the residual quotient. In any case that estimate will be greater than the unbiassed 
estimate, so that our inferences about significant differences of mean values will, properly 
interpreted, be on the safe side. 


23.18. Before going farther we may remark that the quantity we have called the 
residual sum of squares and the associated quotient are often referred to as “error” or 
“interaction ’ terms. The former is likely to cause misunderstanding and is better avoided 
altogether, for, as we have seen, it provides a measure of sampling variance, and there- 
fore of experimental error, only in particular cases. The word “interaction” we shall 
define below ; it has been used in different senses by different writers, and when consulting 
original memoirs the reader should endeavour to ascertain the precise meaning which 
is being attached to it—if he can. In considering a given analysis it is as well to reflect 
on the precise nature of the items covered by such expressions as “‘ residual’’, ‘“‘ remainder ”’, 
‘error’ and so forth. 


Three-way Classification 


23.19. Consider now the case when there are three classifications into A-, B- and 
C-classes. As before, we shall consider in the first place one member in each subclass 
A; B,C, typified by x ,. We now have 


D>, a — #2 =F (ey, — 2,9 +2 (ee, — 2%.) +2 a 2...) 
i, k, l, 
+X (85, — %j.. — By, + %,..)? + 2 (ty — 2. — Ba + %,,,)? 
+2 (@y —%z, — 2. +2,,,)? 
+2 (jy — Lyy, — 0p — Lag +05, + Hy, +7 —%,,,)%,  ~ (23.26) 
the summations extending over all members of the sample, pgr in number, so that we may 
replace expressions such as >. (a;,, —%,,,)? by gr d (x;,, — x,,,)*, ete. 
i, kl, j 
On the usual hypothesis of normality and homogeneity we find that the first three 
terms on the right of (23.26) are distributed as vy? with p — 1, gy — 1 and r — 1 degrees 
of freedom. The second group is so distributed with (p — 1) (¢ — 1), (p — 1) (r — 1) and 
(q — 1) (r — 1) degrees of freedom. The last is distributed with (p — 1) (q — 1) (r — 1) 
degrees of freedom. All but the last of these results follow from the two-way case, and 
the last may be established as in 23.13 or by the consideration that for any fixed J the 
term has (p — 1) (q¢ — 1) degrees of freedom and that there are (r — 1) independent Is. 
We may then write the analysis in the form shown in Table 23.5. (For the present 
the expression “ interaction AB” is to be regarded merely as a name given to a particular 
sum of squares. As before, the sums of squares and degrees of freedom are additive, 


188 THE ANALYSIS OF VARIANCE 


and the seven items into which the total sum of squares is analysed are distributed 
independently.) 


TABLE 23.5 
Form of Analysis of Variance for Three-way Classification with One Member in each Subclass. 
Sum of Squares. d.f. Quotient. 
Between A-classes . 2 (oye ee) p-—l ; 
Between B-classes . 2 (@.4%, — 2,..)* q-1 The quotient of 
Between C-classes . 2 (x..1 — x...) r—l the sum of 
Interaction AB. . = (Xie. — L., — Uk. + 2%...) (p — 1)(q —1) squares by the 
Interaction BC. . & (XL kl — ke. — @..1 + @...)? (q — 1)(r — 1) corresponding 
Interaction CA. . 2 (271 — %.. — ©. + %.,.)? (r — 1) (p — 1) d.f. 
Residual. . . . |X (xjxt — %j.. — Bk. — ©. + ax.) (Pp — 1) (¢ — 1) (r — 1) 
+ ©42 + 05,1 — @,,,)* 
TOTALS.) as =z (xj — X,,,)? Pge a 


23.20. If the hypothesis of homogeneity is rejected we may consider the alternative 
represented by 
Ly = A; + by + 6, + Cy, : , :  (2de2) 
where ¢, as usual, is normal with zero mean. As in 23.16 it will be found that the residual 
term in Table 23.5 has expectation (p — 1) (¢ — 1) (r — 1) v, and hence continues to provide 
an unbiassed estimator of v. The quotients between classes are affected like those in 
equations (23.23) to (23.25); but the interaction terms also provide estimators of v with 
the appropriate degrees of freedom. For instance, 


(%jy, — %j,, — Zu, + %,,.) = a +O, +e, + ly, — (a, +b, +0, + 2;,.) 
= (ee +o J+ (a Be Tee 
= Ao ae —Cx. +¢.. . . . (23.28) 
so that the expectation of the sum of squares of ie x-terms is that of the ¢-terms, which 
we know to be (p — 1) (¢q — 1) v 


23.21. This brings up a new point arising for the first time in the three-way classi- 
fication. If (23.27) is true, the analysis of variance will provide four different estimators 
of the variance v, namely the interactions AB, BC and CA and the residual. These are 
independent (for they depend only on the ¢’s, and the theory appropriate to the case of 
homogeneity continues to apply) and their ratios may be tested in the z-distribution. If 
these ratios are such as can have arisen from random sampling we may accept the hypothesis 
represented by (23.27); if not we must reject it. In short, the interaction quotients pro- 
vide a test of the hypothesis (23.27). In the two-way classification no such test is available. 


Interactions 


23.22. On the hypothesis (23.27) the interaction quotients of type AB give unbiassed 
estimators of the variance v. If in any particular case these quotients differ significantly 
among themselves or from any other independent estimator of v, we have to reject the 
hypothesis. Apart from the normality of the variation of ¢, which is not for the moment 
in question, this means that we cannot represent the data as the sum of separate effects 
due to A-, B- and C-classes, together with a residual ¢ which is the same in form for all 


n-WAY CLASSIFICATIONS 189 


subclasses. The effects of the classes are entangled—or, as we may say, they interact. 
This is the origin of the term “interaction ”’. 

Suppose, for instance, our data are crop-yields, and membership of the three classes 
corresponds to applications of three manures, nitrogen (A), potash (B) and phosphate (C). 
The hypothesis represented by (23.27) would then be equivalent to supposing that all three 
manures exerted an effect on yields, but that they did so independently. A given dressing 
of nitrogen would increase the yield by a,, whatever dressings of the other fertilisers were 
applied. But it might happen that the response in yield to a; varied according to how 
much of the others were present—potash might either stimulate the effect of nitrogen or 
inhibit it. If this were so, the fertilisers would interact and the hypothesis (23.27) would 
break down. Significant departures from homogeneity in the interaction terms usually 
lead us to search for possible entanglements of this kind. 


23.23. It must not be overlooked, however, that significant interactions do not 
necessarily imply interaction in any real sense. They may arise from heterogeneity in 
the data. To return to our example of crop-yields, suppose the yields were taken from 
a series of plots which differed materially in natural fertility. It might very well be found 
that the hypothesis (23.27) could not be justified even if the differences in yields due to 
the natural effect were partially absorbed into the coefficients a, b and c. If by chance 
the heavier dressings of fertilisers were applied to plots of greater fertility, the hypothesis 
might be shown as failing and “ significant” interactions appear. Such points as this 
require careful consideration in the interpretation of significance, and we shall illustrate 
them in some examples below. 


23.24. Interactions of type AB, involving two classes, are said to be of the first 
order. When considering the general n-way classification we shall see that there can 
appear interactions of second, third, fourth . . . order. In fact, the residual in Table 23.5 
is formally equivalent to an interaction of the second order, of type ABC, just as the first- 
order interaction is equivalent to the residual in the two-way analysis of Table 23.2. 

To complete the definitions, we may define the sum of squares between A-classes as 
an interaction of order zero. The seven constituent items in Table 23.5 would then 
correspond to the following :— 


Interaction. dete 
A p—l 
Order zero E { B q-—1 
C r—l 
AB (p — Ig 1) 
Order 1 . { BC (¢q —1)(r — 1) 
N CA (r — 1) (p — 1) 
@Orders2e ee ABC (p — 1) (¢q —1)(r — 1) 


This illustrates the general symmetry of the analysis and suggests obvious generalisa- 
tions. 


n-way Classifications 
23.25. For instance, with five classes A, B, C, D and E we may analyse the total 


; 5 ; 
sums of squares into 25 — 1 = 31 components. There will be a = 5 interactions of 


190 THE ANALYSIS OF VARIANCE 


order zero ; (;) = 10 interactions of first order, type AB; (=) = 10 interactions of 


2 
5) : : : 
second order, type ABC ; ( A = 5 interactions of third order, type ABCD; and one 


residual or interaction of fourth order, type ABCDE. The interactions of zero, first and 
second order are of a type already familiar :— 


eeece 


The third-order interactions are typified by 


2 (Sim. — Mia.. — “dame in er. eee ee 
+ @,. + ®zm. + %..tm. — Cece — Pikeece Biche — Cin, F%..4..)7 + (28.30) 


and the reader will be able to write down the residual for himself. 

As usual, the 31 terms all furnish independent estimators of the variance on the 
hypothesis of homogeneity, and if this is rejected we may consider the alternative 
represented by 

Dinan — Oy 4 Oe 1 Cf 1 Ga ee renee : : . (238.31) 


The complete analysis in such cases may become very complex, but frequently it is sufficient 
to consider only sums of squares suggested for investigation by prior expectations. 


Example 23.3 


The following data show the percentage water-content in a number of samples of 
a commercial product. Six samples were chosen ; each sample was tested by four different 
operators ; and each operator carried out the determination by three different methods. 
We have thus a 6 x 4 x 3 classification. 


TABLE 23.6 
Percentage Water-Content of Six Samples determined by Four Operators using Three 
Methods. 
Operators. 
2 3 4 
Samples. = 
Tests. Tests. Tests. 
Pale 1 y 3 1 2 3 
i 60 58 55 58 62 54 56 59 
2 58 58 61 60 57 60 56 58 
3 55 56 54 52 58 53 55 55 
4 57 57 54 58 55 61 59 58 
5 58 59 61 57 60 62 60 60 
6 63 61 64 62 59 59 60 61 


n-WAY CLASSIFICATIONS 191 


We will first of all analyse the variance systematically with rather more arithmetical 
detail than is usually required, in order to illustrate the process. 
A great deal of work is saved if we take a mean at 60. The table then becomes— 


TABLE 23.7 


Operators. 


Samples. 


We have shown the totals of the tests for each operator, of the tests for all operators, and 
of samples for each test. 

We now form three two-way tables from this by adding the values of one of the 
variates, e.g.— 


TABLE 23.8 
Operators. 

2 3 
== [5 — 5 
, = 
— 14 — 16 
Samples. — 10 13 
— 4 —2 
6 5 
— 34 — 33 


192 THE ANALYSIS OF VARIANCE 


TABLE 23.9 
Tests. 


Samples. 
TABLE 23.10 
Operators. 
1 
2 
Tests. sa 
TOTALS 


As we have inserted the totals of various kinds in Table 23.7 these subsidiary tables 
can be picked out at once ; but in general, totals are not available in the original (and for 
four-way classifications it is difficult to find a form of tabular presentation which will permit 
of their insertion) so that the tables have to be separately compiled. In practice I find it 
convenient to do so in any case to avoid picking out the wrong figures in the original table. 

Pursuing the condensation process, we should now derive three one-way tables from 
Tables 23.8 to 23.10, but in fact the row and column totals already give us what is required 
(and incidentally provide a check on the arithmetic). 

Now we proceed to find the various sums of squares. For the total of all observations 
we find — 115, and for the sum of squares of observations 653. Thus 


peel? = 2 aso nons 
12 
Na? = — 115z_.. = 183-680,556 


2 (jg — £,,.)*? = X (Xjq)? — Na, 
= 653 — 183-680,556 
= 469-319,444 e ° e e . (23.32) 


with 6 x 4 X 3 —1=71 degrees of freedom. 


n-WAY CLASSIFICATIONS 193 


For the interactions of order zero we require the sums of type 
2 (2j,, — %,,,)? = 2 (x;,.)? — Na? 


Jes 
where summation takes place over the NV values. It is, however, unnecessary to work out 
the means x; . Consider, for example, the sum of squares between samples. From the 


totals of Table 22.8 or Table 22.9 we find (j denoting samples)— 


2 (12x, )* = (S20 (— 20) 2... + 138 
= 5009, 
where the summation is over six values only. Thus, for summation over the 72 values— 


E (a;,.)? = = 5009 = 417-416,667. 
Hence 
= (xj. — w,,.)? = 417-416,667 — 183-680,556 
=233-7386,111. . . . « « (28.83) 
with 6 —1=5 df. 
Similarly (& denoting operators) we find— 
Z (2.4, —%,.)2= = — 183-680,556 


=16152775 . . . 4 mpage) 
with 3 d.f.; and (J denoting tests)— 
5 Cee ee aS — 183-680,556 


= 3-444,444 : : : : . (23.35) 
with two degrees of freedom. 
Now we require first-order interactions. We have (summation being over the NV 
values)— 
2 (Xj, — %].. — Ue, + ,,,)? = 2 (wy, — %,,,)® + 2 (a5, — @,,,)? 
+ 2X (x, — @,,,)? — 22 (ay, — &,,,) (%,, — %,,.) 
— 22 (Xx, — &,,,) (Xx, — %,,.) 
= 2 (ty, — @,,,)? — 2 (aj, — #,,,)? —X (ey, — &,,,)? (23.36) 
and thus the first-order interaction term is ascertainable from 2 (x;,,)? and quantities which 


have already been computed. 
From the body of Table 23.8 (remembering that summation relates to 72 values and 


hence that each value in the table is counted 3 times) we find 


3 1499 
> (%jx.)" = 35 a oer) amc ae 
== 499-666,667. 


The interaction term is then 
499-666,667 — 183-680,556 — 233-736,111 — 16-152,778 = 66-097,222 . (23.37) 
with (6 — 1) (4 —1) = 15 df. 
Similarly in the body of Table 23.9 we find for the sum of squares 1915. Hence the 
interaction of samples and tests is 


1915 __ 13.680,556 — 233-736,111 — 3-444,444 = 57-888,889.  . (23.38) 


A.S.—VOL. II. 9 


194 THE ANALYSIS OF VARIANCE 


In the body of Table 23.10 the sum of squares is 1245. Hence the interaction of tests 
and operators is 
a __ 183-680,556 — 16-152,778 — 3-444,444 — 4.299.299, . (23.39) 
Finally, the residual is given by the difference of the total sum of squares and the 
interactions already found, namely by 
469-319,444 — 233-736,111 — 16-152,778 — 3-444,444 — 66-097,222 — 57:888,889 
— 4-222.222 = 87-777,778 es . (23.40) 
with (6 — 1) (4 — 1) (3 —- 1) = 30 degrees of freedom. 
We can now make up the table of variance analysis as follows :— 


TABLE 23.11 
Analysis of Variance of Data of Table 23.7. 


Faaa { 
Sum of Squares. colat Quotient. 
Between samples (S) . . . 233-736 5 46-747 
ie operators (O) . . . 16153 3 5-384 
6 fost: (i) aes aes 3-444 2 1-722 
losinerpexeoryn NNO 5 o 4 a ¢ 66:097 15 4-406 
OM OTe ees 4-222 6 0-704 
FP ST ee 57-889 10 5-789 
Inecieieil 6 «4 5 o » a@ « 87:778 30 2-926 
AMOS] 6 o 5 & ¢ 469-319 aM 


We proceed to discuss the data in the light of this analysis. 
The most striking feature of the table is the size of the quotient between samples. 
46-747 
2-926 
For », = 5, v, = 30 the 0-1-per-cent. point is 0-8554, and the ratio is highly significant. 

We remark in passing on a point which will be taken up later. The ordinary z-test 
gives the probabilities that the ratio of two variances chosen at random does not exceed 
a given value. But in this case we have deliberately picked out the largest quotient for 
one of our estimates. Ifz had fallen at the 5-per-cent. level we could not have argued that 
the odds were 19 to 1 against the event. They are very much less, since we have deliber= 
ately chosen the largest value for comparison with the residual. However, in the present 
case our probability is so small that we can confidently assume the significance of z (see 
23.27 below). 

Our first inference, then, is that the whole sample is not homogeneous. There appear 
to be variations from sample to sample which are not assignable to differences between 
tests or operators, and if we wished to standardise our product with greater accuracy we 
should be led to examine the manufacturing process. This conclusion is, however, subject 
to a point which we discuss in the next example. 

Having rejected the hypothesis of homogeneity we are now faced with the question 
whether the other quotients in Table 23.11 can be compared so as to assess the relative 


The variance ratio here is 


= 15-976, with a corresponding value of z equal to 1-38. 


n-WAY CLASSIFICATIONS 195 


variability of the other factors. We must then take a new hypothesis, and we will suppose 
that the variable may be written 
Wypy = Ay + En, . . . . » (23.41) 


where a; is an unknown quantity expressing the accepted variation between samples. 
Unless there is something very peculiar about the tests or operators it is reasonable to 
suppose that the variation between samples can be isolated in this way. We will now 
suppose that the é’s, not the 2’s, are distributed normally with common mean and variance v. 
If the values given by (23.41) are substituted in the various constituent items of Table 
23.5, it will be found that except for the variation between samples all the other sums of 
Squares assume the same form with & written instead of x. This, of course, follows from 
23.20 of which our present hypothesis is a particular case. On the hypothesis of (23.41) 
we are thus enabled to compare the quotients in the table in the usual way. The element 
of variation between samples has, so to speak, been abstracted from the discussion. 
We then turn to the sum of squares between operators in Table 23.11. The variance 


‘O84 , oes 
= 1-84. For 1, = 3, 7, = 30 this is not significant. Similarly, for the sum 


RAO 1 ——— 
eer ey: 


e 


2 é Aste ae 
of squares between tests we find a ratio of 3-996 again not significant. Provisionally we 


conclude that there is no evidence of variation between operators and tests, apart from 
pure sampling effects. 
Now we have to consider the interactions. For that of SO we have the variance ratio 


— = 1:51, which is not significant. We find the same for the interaction ST. For 
OT we have (taking the larger variance as the numerator) 
2-926 
@ = $ log. Szpg OTN De = ath By == 


This value is just beyond the 5 per cent. point and, judged by itself, might have been regarded 
as significant ; but taken in conjunction with the others it may, perhaps, be accepted as 
a permissible sampling fluctuation. 
To sum up, therefore, the only evidence of deviation from homogeneity appears in the 
* sample-differences, and we see no reason to reject the hypothesis represented by (23.41). 
Since all the other items in the analysis, apart from that between samples, are homo- 
geneous, we could condense the table into the form— 


Sum of Squares. d.f. Quotient. 
Between samples 5 9 6 © 233-736 5 46-747 
Remainder . .. .- « - 235-583 66 3°569 
| 
INOMNEE) 5 46 oo 6 9 C | 469-319 fal 


The reader may wonder why, in carrying out the tests of significance, we have through- 
out used the residual quotient as the denominator of the variance ratio, and not, for instance, 
one of the interactions. There are two reasons. First, the residual has more degrees of 
freedom, so that it is preferable notwithstanding that the z-test is valid for any number 


196 THE ANALYSIS OF VARIANCE 


of degrees of freedom. Second, the residual is not so likely to be affected by interactions 
which, though not emerging into significance, might nevertheless exist. But once we have 
established that an interaction is not significant, there is no reason why it should not be 
amalgamated with the residual, as in the table on page 195. 


Example 23.4 

There is a point of great importance concerning the inference from analyses of variance, 
which we will illustrate by an imaginary example based on the data we have just con- 
sidered. Suppose our analysis of variance were of the following form :— 


Sum of Squares. d.f. Quotient. 
Between samples . . . . | 125 5 25 
Between operators . . . . 60 3 20 
Interaction SO . . .. . 150 15 10 
INGA 96 6 5 6 6 5 48 48 1 
Tors See 5! 383 71 


We will suppose that the sums of squares between tests and the other first-order inter- 
actions are not significant, so that they can be amalgamated with the residual to give a 
remainder with 48 degrees of freedom as shown. 

On this evidence the sums of squares between samples and between tests are both 
significant, as also is the interaction SO. What inference can be drawn about the varia- 
bility of the product from one sample to another? We know that the readings differ 
significantly ; but may not this difference itself be due to the demonstrated variation 
between operators, or does it really exist ? Is there in fact any variability in the water- 
content of the product, apart from the sampling effect in homogeneous variation ? 

The significance of the SO interaction means that we cannot now regard the effects 
of operator and sample as independent. We must consider the possibility of entanglement. 
This is not the only explanation—there may be some other specific cause of variation 
present which we have not thought of, and on which our present data throw no light. But 
in this case there is some prior possibility that samples and operators are “‘ entangled ”’ or 
interacting in the ordinary sense. An operator may be getting better results from his 
material when it has high water-content than in the reverse case ; or, knowing that the 
mean content is near 60 per cent. he may unconsciously (or even consciously) bring his 
determinations nearer to that figure and hence reduce their spread. 

In a case of this kind, and indeed in all statistical inquiries, it is important to have 
a clear idea of the question which is being asked and of the population to which it relates. 
We have had a number of samples and have tested them by four operators each using 
three tests. So far as we can see, the tests are equivalent but the operators are not. All 
the same, we are not very interested in the variation among operators (unless this is 
an experiment in psychology and not in chemistry). What we want to know is whether 
the water-content varies in reality, that is to say as the average of a large number of 


determinations by different operators. Our particular four are themselves samples of 
a population of operators. 


n-WAY CLASSIFICATIONS 197 


If we confine our attention to the four operators and suppose that each has a specific 
reaction to particular samples m,,, so that 


Uke = Mix + Ein e e ry e . (23.42) 
where € is a normal random residual with variance v for all j, k, then in the usual 
way we find 
ES (xz, a v, = Xk + + a = (p — 1) (q a, 1) Vv a X (My, es m;, reo mM + m._.)3 ° (23.43) 
But suppose we consider the matter from a different viewpoint. Regard m,, as itself 


chosen at random from a normal population of operators with variance v’. Then, taking 
expectations of this population in addition, we find from (23.43) 


ES (ty, —2%,—t,+2%,)2=(p—-lq—-Yw+e). . — . (23.44) 
Thus the interaction term provides an unbiassed estimator of the variance v + v’ of Lie 
By “unbiassed ”’ in this connection we mean that the average over all determinations and 
all operators will give the variance of x;, in the population of all determinations and all 
operators. 
Similarly we shall have, on the same interpretation, 
Gy 2— — f 
BE (xy, —2,)* = (p — 1) (0 + aH See. 
HX (vw, —2,,)?=(@q—1)e+r) 
and hence the ratio of either interaction of zero order to the first-order interaction may be 
tested for homogeneity. Our analysis then becomes— 


lcm : - : = a ! 


Sum of Squares. d.f. | Quotient. 
Between samples . .. . | 125 5 25 
Between operators . . . . 60 3 | 20 
esiduala(SO). 9: se ee | 150 15 | 10 
OMA 6 G5 ae 335 23 


Neither ratio is now significant. For the sum of squares between samples we have 
a ratio of 2:5, », = 5, v, = 15, which is below the 5 per cent. point. 

Thus we should conclude that, regarding the data as a member of possible samples from 
all possible operators, there is little or no evidence of real variation from sample to sample. 
This is quite consistent with the inference we drew at the beginning of the example as to 
the ‘significance’? of the terms concerned, though at first sight it appears directly 
contradictory. In the first case we inferred that for these four operators there were signifi- 
cant differences in their determinations for the samples, so that sample-differences are 
‘real’? in the sense that they cannot be attributed solely to random variation in homo- 
geneous material. In the second case we enlarge the domain by considering operators as 
subject to “error ”’ in the sense that one human being differs from another, and find that 
sample-differences can now be ascribed to variation in the population of operators. 

No further emphasis is needed on the care necessary for the proper interpretation of 
the results of an analysis of variance. The nature of the population which is being con- 
sidered should be brought explicitly to mind in every case; and the reader should form 


198 THE ANALYSIS OF VARIANCE 


the habit of asking himself, whenever a result is found to be “ significant”: significant 
of what ? 


Arithmetic of Variance Analysis 

23.26. Before considering further examples we will dispose of a few points arising 
from the calculation of the constituent sums of squares and the application of the z-test 
in determining the significance of variance-ratios. 

The calculation of sums of squares for an n-way classification can very conveniently 
be carried out by the use of a punched-card system when the data are numerous, and some 
remarkable computing feats have been performed by this technique. For ordinary labora- 
tory work with a machine, the process of Example 23.3 is possibly the best, though some 
modifications may be made to suit individual taste. 

The main work lies in computing the total sum of squares. This is done by finding 
the sum of squares of observations from the original data (with a convenient working 
mean) and the sum of observations obtained at the same time. The formula 

Z (sq — % .)? = 2 ahig — Nx? 
= 20, oe : ; . (23.46) 
then gives the total sum required. The quantity Nx’ is constantly needed and should 
be recorded. It is useful to preserve a few more decimal places than will ultimately be 
used in the final presentation of the analysis. 

The original data are then condensed into n (n — 1)-way tables by summing over 
each class in turn. In Example 23.3 this was done so as to give three tables : Operators— 
Samples, Tests-Samples and Operators—Tests. The main body of these tables gives means 


of the type x, multiplied by a constant factor. A further condensation will give (5) 


sets of means of type x;,,; and so on, as far as is required. 

From the condensed tables we can then determine the sums of squares of means of 
various orders, and hence the interactions. The main pitfall lies in the way of the applica- 
tion of the correct multipliers and divisors—it has to be borne in mind that the summation 
takes place over all values of the sample. 

Suppose, for example, we have a four-way classification into classes with p, g, r and s 
numbers of members. The first condensation gives us four tables of which a typical one 
isp x q x r, based on the sum of s members. The next condensation gives us six two-way 
tables typified by p x q, based on the sum of rs members. The third gives us four one- 
way tables such as p, based on qrs members. Consider the variance between p-classes :— 

2 (2, 


fe 8...) SE, = Ne | a ee 
In the condensed one-way table of p classes each term is to be counted qrs times, and 
thus, if S is the sum of squares in this table as it stands, 


Pp 
= ae 
j=l 


Thus, summing over all members, we find 


ST" A2 = Eats .8 2 
Xe. a” ara)? 
aS 
ager : : Sees . (23.48) 
whence (23.47) gives the zero-order interaction for p-classes. Similarly for g, r and s. 


USE OF THE 2z-TEST FOR SEVERAL VARIANCE-RATIOS 199 


For the first-order interaction we have 


cy eee ee, ee 
Bee eg ee) (i re) - (23.49) 


The last two terms on the right have already been found. We require 
Di wey aa — Nato. 4) 28.50) 


J 


If 8’ is the sum of squares of elements in the body of the two-way table found by adding 
r- and s-items, we find 
iS 
‘ as . . . an r) 2320 
2 Rie e ’ ( ) 


and so on. The general process will now be clear. 

Unfortunately there is no convenient independent check on the calculations. The 
various condensed tables are self-checking since their totals are the sum of all observations, 
but the sums of squares do not check with anything. It is, of course, possible to evaluate 
each individual term in the residual and to check by summing squares, but this is too 
laborious for use except in the simplest cases. 


Use of the z-test for Several Variance-ratios 


23.27. In the complete analysis of n classes there are 2” — 1 elements, and the 
number of variance ratios arising for test may be considerable. The z-test gives the proba- 
bility that a particular value chosen at random will be exceeded. If therefore we pick 
out the largest ratios for test, the chance that one of them is “ significant ” in the sense 
of exceeding the 100P-per-cent. point is a good deal greater than P, and we run into the 
danger of attributing significance to what may be a pure sampling effect. 

Suppose we make r different and independent tests of r values of z. The chance that 
each does not exceed a fixed value (depending on the number of degrees of freedom) is 
1 — P, where P is some assigned level of significance. Hence the chance that none of 
them exceeds its appropriate value is 


(1 — P)’ =1 -- rP, approximately, : : . (23.52) 


provided that P and rP are small. For instance, if P = 0-01 and r = 7 the probability 
that no z exceeds its appropriate significance value is 0-93, and thus there is a probability 
of 0:07 that at least one of them will do so. 

In practice the problem of numerous comparisons is more complicated because they 
are not independent. In such circumstances our judgment of significance has to incor- 
porate an element of the intuitive. However, if all the comparisons are based on the 
common residual quotient it is possible to find the probabilities that the largest of r values 
exceeds assigned values. ‘The resulting expressions are complicated, even when all the 
sums of squares have the same degrees of freedom, but reference may be made to Hartley 
(1938) for approximations and to Cochran (1941) and Finney (1941a) for exact expressions. 
The conclusion reached by Finney is that if the degrees of freedom in the residual are 
sufficiently numerous the ratios may be treated as completely independent. 


23.28. ‘There is a particular case of the n-way classification which is worth special 
mention, namely, that for which each classification is a simple dichotomy, so that there 
are 2” subgroups. This case arises frequently when so-called “ factorial’ experiments 
are being conducted to determine the effect of a treatment which is either applied or with- 


200 THE ANALYSIS OF VARIANCE 


held. The analysis of variance remains the same in principle, but of course the arithmetic 
becomes a good deal simpler. 


Example 23.5 (F. Yates, Supp. J.R.S.S., 1935, 2, 181) 

An area of ground was sown with peas and divided into 24 plots in the manner shown 
in Table 23.12. The plots received, or did not receive, dressings of nitrogen (NV), phosphate 
(P) and potash (K) in the manner shown, the yields in pounds being given in the table. 


TABLE 23.12 
Yields of Peas and Manurial Treatments on 24 Plots 
i 1 

TAR N iG 
49-5 46-8 62-0 45-5 
NP NK NPK P 
62:8 57:0 48-8 44-2 
N Le NE NK 
59-8 55-5 52-0 49-8 
NPK JE EIR 
58-5 56-0 51-5 48-8 
JE N NK PK 
62-8 69-5 57-2 53-2 

NPK K INE? 
55:8 55-0 59-0 56-0 


There is some purpose here in the alternation of treatments, but that need not concern us 
for the present. We have 24 observations in four classes, viz. blocks (3), nitrogen (2), 
phosphate (2) and potash (2), giving 3 x 2 x 2 x 2 = 24 records. 
Condensing the table by adding blocks we get the following :— 
No treatment Ni iE K NE NK PE NPE Tors 
154-3 - 191-3 163-0 156:0 173-3 164:0 151-5 163-1 1317-0 


Condensing according to the three treatments we have— 


N not-N TOTALS K not-A 


336-9 314-5 651-4 Je 314-6 336:8 


355°3 310-3 665-6 © not-P 320-0 345-6 


\ 
eee [neem emma al 


692-2 624-8 1317-0 ToTALs 634-6 682-4 


USE OF THE 2z-TEST FOR SEVERAL VARIANCE-RATIOS 201 


not-N Toras 
307°5 634-6 
317°3 682-4 
624-8 1317-0 


We omit the remaining calculations. The analysis in its final form is given in 
Table 23.13. 


TABLE 23.13 
Analysis of Variance of the Data of Table 23.12 


Sums of Squares. d.f. Quotient. | 

Between blocks (B). . . . 177-803 2 88-90 
. Neo ane Os 3 Wah as 189-282 1 189-28 

s ee Eee 5 ee 8-402 1 8:40 

3 JCS. ee ee 95-202 1 95-20 
Interaction BN . . .. . 94-255 2 47°13 
és DB Peo ae so ee: 2-260 2 1-13 

BK@ I) 50 23-685 2 11-84 

ae INPaege ees 21-281 i 21-28 

i IRE GSS Ds ate. See 33-134 1 33-13 

is BK ea eee. 0-481 1 0-48 

F isiNiie? ae 25-302 2 12-65 

5 IBIS ne 36-004 2 18-00 

FF BES. sa : 3-782 2 1-89 

A NIE KG es 37-003 1 37-00 
Residual (BNPK) . . . . 128-489 2 64-24 

MODAUSI scure s 3 | es 876-365 23 


We have carried out the analysis in full so as to illustrate the arithmetical process 
for a four-way classification, but we may note at once that it is unduly elaborate. There 
are only 24 observations in the data and we cannot expect them to provide all the answers 
to the questions which we could frame as to the significance of the various constituent 
items in the analysis. This is borne out by the z-test. The residual variance is 64-24 
with two degrees of freedom. For 7, = 1, », = 2 the variance ratio at the 1-per-cent. 
point is 98-49 and that for », = 2, vy, = 2 at the same point is 99-00. Only values greater 
than about 100 times 64-24 or less than 1/100th of that value would thus be significant. 
Only the interaction PK falls outside this range, and even this, among so many, can hardly 
be regarded as significant. 

The inquiry is not, however, completely frustrated. Since the second-order inter- 
actions are not significant, we amalgamate them with the residual to give a remainder 
sum of squares of 230-580 with nine d.f. and a quotient of 25-62. It will now be found 


202 THE ANALYSIS OF VARIANCE 


that among the first-order interactions only two are significant, PK and BP being too 
small. Had they been too large we might have attributed some genuine significance to 
this result, but it is not very plausible to suppose that there is a “ real ” interaction between 
blocks and phosphate, or that phosphate and potash inhibit each other’s action. The 
differences from expectation are more probably due to individual soil variation from plot 
to plot. 

If we accept the first-order interactions as not significant, we may amalgamate them 
with the remainder to give the following :— 


Sum of Squares. d.f. Quotient. 

IBN ee ye A Gee bona 177-803 2 88:90 

(NMS cc 4 vere 6 Bm co 189-282 1 189-28 

(eee oo oe ee oe ic 8-402 1 8-40 

(Ki oS | eee ss es Vea 95-202 1 95-20 

In@arkWatels? 5 4G Ha «wm oc 405-676 18 22-54 
TOUALS@ENSEN 08 sue) 876-365 23 


Here the P-quotient is not significant, but the variance ratio for blocks, 3-99, is near the 
5-per-cent. point. The N-quotient will be found to be significant at the 1-per-cent. point, 
the K-quotient near to the 5-per-cent. point. Our conclusion is that there is strong indica- 
tion that nitrogen influenced the yield, some indication that potash did so, and little indica- 
tion that phosphates did so; and that there is ground for suspecting heterogeneity in the 
soil partly because of the difference between blocks and partly from some of the first-order 
interactions. 

In this case, of course, we knew already more or less what was to be expected of these 
data and are the readier to accept the conclusions on that account. Had we known nothing 
of the effect of fertilisers on leguminous crops our conclusions on such slender evidence 
must have been very tentative indeed, particularly if we wished to extend them to peas 
grown on other soils under different climatic conditions with different amounts of fertiliser. 


Example 23.6 (C. E. Gould and W. M. Hampton, Supp. J.R.S.S., 1936, 3, 137) 


In the manufacture of optical glass there appear small bubbles known as “seed ”’, 
which constitute a defect. The glass is made in “ pots”’ which take about a year to pre- 
pare, and are run continuously over long periods when once started. There are two pots 
to a furnace and materials are introduced into a pot from time to time which, after fusion, 
provide a “run” of glass. Each run provides several days’ work, one day’s work being 
known as a “‘ journey”. At each journey quantities of glass are drawn from the pot and 
blown into “ cylinders ’’, there being about 18 or 20 to the journey. For the purposes of 
the experiment three cylinders were chosen, the third, tenth and sixteenth, and pieces of 
regular size cut from them for examination as to frequency of seed. The first five journeys 
of each of five runs were sampled. 

We have here a four-way classification 2 (pots) x 5 (runs per pot) x 5 (journeys per 
run per pot) x 3 (cylinders per journey per run per pot). The actual dates of the runs 
were February 16th, May 23rd, June 12th, September Ist and December 6th, so that the 
manufacturing period covered about ten months. We shall assume that the glass was 


USE OF THE z-TEST FOR SEVERAL VARIANCE-RATIOS 203 


of the same type throughout, although in actual fact it was different in one or two cases 
—but not sufficiently different to affect the analysis. 

The topic of main interest here is whether the frequency of seed varies significantly 
according to the four factors concerned. If so, the alteration of manufacturing conditions 
may improve the wastage due to seed ; but if not--and the variation is the kind of thing 
which ean be accounted for as chance fluctuation in sampling from a homogeneous popula- 
tion—there is little hope of improvement except perhaps by a radical alteration in the 
process affecting all pots, runs and journeys alike. 


TABLE 23.14 


Frequency of ‘ Seed” in Samples of Glass 


Potwel: | latory 2, 
Cyl. 1. Cyl. 2. | Cyl. 8. Cyl. 1. Cyl. 2. Cyl. 3. 
7 : 
J 47 56 100 52 61 88 
2 55 89 93 49 62 97 
nei So 35 57 56 34 60 72 
4 78 67 113 47 93 118 
5 | 33 40 128 16 29 130 | 
ae  5gtiéd 66 36 65CS 80 40 
J 9 21 61 49 129 97 79 
Rano 3 | 31 39 25 45 54 72 
| 4 43 72 52 109 120 80 
5 37 51 67 67 85 63 
| i ae 50 61 60 15 139 130 
2 | 33 27 49 46 58 63 
Run 3, 3 | 24 39 24 15 33 39 
4 18 18 43 22 16 19 
5 28 42 28 27 19 22 
Tr | 24 34 43 46 66 24 
2 | 24 49 42 40 117 105 
and. 8 21 21 51 30 28 34 
4 21 69 48 36 64 53 
L 5 | 76 48 49 39 60 78 
Ji | 31 54 40 19 93 36 
2 34 24 46 16 12 2 
Bune 3 | 120 122 120 33 58 107 
fA / 109 119 120 25 63 90 
B | 69 49 60 34 43 30 


Before plunging into the analysis of variance it is as well to look over the data to see 
whether they themselves suggest any lines of inquiry. We observe considerable varia- 
bility from journey to journey within the same run, J3 and J4 of run 5 being conspicuous 
in pot 1; and in run 1 the numbers of seed appear to increase from cylinder | to cylinder 3 
in a rather exceptional way. The runs themselves seem to differ materially. Prior con- 


204 THE ANALYSIS OF VARIANCE 


siderations also suggested an examination of the way in which frequency of seed varied 
between pots, since they were chosen so as to differ substantially in constitution. 
A complete analysis of variance of the data is as follows :— 


TABLE 23.15 
Analysis of Variance of the Data of Table 23.14. 


if 
| 
Sums of Squares. d.f. Quotient. 
a 
Between pots (P) . .. . 898 1 898 
» runs (Rk). . owe 14,059 4 3,515 
PF journeys (J) . . . 4,355 4 1,089 
a cylinders (C) . . . 10,631 2 5,315 
haywentxetuvorn JZ 5 5 9 & « 16,133 4 4,033 
5 TEI Gea ps Gore 4,081 4 1,020 
res POAS oiaecene 587 2 293 
ne TYE og & eu 45,934 16 2,871 
3 Ji OF ee cee Ne el 11,626 8 1,453 
5 SO 3 eeekees 4 2,540 8 317 
a JETER ee cbse 9,711 16 607 
i TKO § fee 6 12,472 32 390 
ns ChOPeR Seen oS. 1,656 8 207 
. Gletie seg 5 -< 1,862 8 233 
Residual (PRJC) y at 8,110 32 253 
BS EA 144,655 149 


The second-order interactions will be found non-significant, so we amalgamate with 
the residual, giving a sum of squares 33,811, d.f. 96, quotient 352. 

It then appears that of the first-order interactions PR, RJ and RC are significant and 
PJ may beso. There is beginning to appear evidence of heterogeneity, and that of a rather 
complicated kind. It seems that pots are interacting with runs, runs with journeys and 
runs with cylinders. 

Taking 352 as the quotient, we find that except for P the zero-order interactions are 
significant. The five R-means are 68-50, 62-67, 42-23, 47-77 and 59-27, so that the variation 
of runs is not a simple rise or fall, which could have been explained as a time-effect. The 
five J-means are 58-93, 55-37, 49-97, 64-83 and 51-33, again not a regular effect. The 
C-means are 44:46, 59-68 and 64-12, which are significantly different. Inspection of the 
table suggests that the first run is the source of the trouble. 

With data as heterogeneous as these it is rather difficult to set up a plausible hypothesis 
to test. The interactions of first order suggest that no simple additive effects of the four 
factors will explain observation, and if these terms are used as denominators in tests of 
variance ratios the variation between classes appears on the whole non-significant on the 
usual hypotheses. The analysis, then, suggests several subjects for inquiry as concerns 
the homogeneity of the data, but does not suggest any simple explanation of the observed 
figures. The reader may care to refer to the original paper for a more complete discussion 
of the subject. 


NON-NORMAL DATA 205 


23.29. Perhaps we may pause at this point to review progress. We have seen 
that for an n-way classification of the special type wherein each subclass contains a single 
member, the sum of squares of all observations about their mean can be exhibited as the 
sum of a number of such sums. On the hypothesis of normality and homogeneity each 
constituent sum of squares, on division by its appropriate number of degrees of freedom, 
gives an estimator of the parent variance, and each is distributed as y? independently of 
the others. The hypothesis of homogeneity can then be tested in Fisher’s z-distribution, 
subject to the adoption of a conservative attitude where many tests are made on the same 
data. Ifthe hypothesis is rejected we may replace it by a simple form in which the effects 
of the different classes are additive, provided that the interactions are not significant. 
The particular ratio chosen for a test depends on the hypothesis concerned, and it is import- 
ant to have a clear idea of the exact question to which an answer is sought. 


23.30. In the next chapter we shall consider the case when the numbers in different 
subclasses are not equal, discuss the additive hypothesis in more detail, examine the relation- 
ship of variance- and regression-analysis, and extend our results to the analysis of covariance. 
We conclude this chapter by an examination of the important question: what can be 
done with the analysis of variance when the variation is not normal ? 


Non-normal Data 


23.31. The analysis of a sum of squares into its constituent sums can, of course, be 
undertaken in all circumstances, but the various quotients may not continue to provide 
unbiassed estimators of the parent variance if the population is not-normal. What is 
equally serious, the constituent sums of squares may not be distributed independently. 
Thus, when parent normality cannot be assumed, the quotients in the analysis table are 
no longer equal within sampling limits and their ratio is distributed in unknown form ; and 
even if the form were known it would probably depend on parent parameters and hence 
fail to provide an exact test of significance. 

The problem has been considered in four ways :— 


(a) Sampling experiments have been undertaken to see how far moderate deviation 
from normality affects the z-distribution ; 

(6) Attempts have been made to find transformations of the variate to throw the 
parent distributions into forms with equal variances, at least approximately, 
before the analysis is applied ; 

(c) By introducing a randomising process into the data before they are collected, 
attempts have been made to preserve the z-distribution as a close approximation 
—this amounts to a change in the nature of the inference, as we shall see below ; 

(d) Tests have been found which can be applied to ranked data irrespective of the 
parent form—this approach is a particular case of (c), but seems to merit special 
mention. 


We proceed to consider these four possibilities. 


23.32. The arithmetic entailed by a single analysis of variance, even in simple cases, 
implies that an extensive sampling inquiry into the distribution of z in non-normal popula- 
tions would be a very formidable undertaking. E.'S. Pearson (19316) has studied in some 
detail the case of a one-way classification with unequal numbers, when the distribution 


206 THE ANALYSIS OF VARIANCE 


of z becomes equivalent to that of the correlation ratio 72. Six populations were chosen, 
characterised by the following values :— 


fi =0, 8, = 2:50 (symmetrical platykurtic) ; 

f,=90, f2=41 (symmetrical leptokurtic) ; 

Bb, =0, f, = 7-05 (symmetrical leptokurtic) ; 

by = 0:2, -B2=—2:3 (skew, Type It); 

By = 0-49, B, = 3:72 (skew, Type ITI); 

B, = 0:99, B, = 3:83 (very skew, Type I, with abrupt start). 


The results suggested that for this range of 6, and /, the distribution of z is adequately 
represented by Fisher’s distribution, and that therefore the homogeneity test may be 
applied. The case when the variation changed from group to group was not considered. 
It was also concluded that “it seems probable that the more elaborate forms of analysis 
of variance are also of fairly wide application ”’. 

Some work by Eden and Yates (1933) is often referred to as experimental confirmation 
of the same kind, but in fact it was carried out with rather a different object, that of con- 
firming the z-test for data under randomisation (see below, 23.36). 


Variate Transformations 
23.33. Suppose é is a new variate (x). Then approximately we shall have 


var é = (5) vm Cane ; : : . (23.53) 


If now the parent variance of the x-distribution is related in some known manner to the 
mean, say f(m) = v, we have 


_(#\ 
var € = (z) f (m). 
As a further approximation, if « varies about m by small quantities we have 


vane = (ze) fe. : : : ‘ . (23.54) 


Now we wish é to have a constant variance, say A, and if this is so, 


dea 
Ae 


or =| JF@ da, 5 we eg (BBD 


Although this expression is arrived at by approximation we are entitled to hope that 
the variate & will have almost constant variance, and at any rate a more stable variance 
than 2. 

For instance, if the original variation is thought to be of the Poisson type we have 
f (v) =a, and from (23.55) are led to consider the transformation 


~ fae 
é= TE 


= Vx, : ; : : : . (23.56) 


VARIATE TRANSFORMATIONS 207 


if we choose A to be }. Similarly, if the variation is of the binomial type with variance 
p (i — p) we have 


(wae 
Ee esey 
=i nt oon 


on suitable choice of A. 


ce 


23.34. These transformations are designed to “stabilise”? the variance. They do 
not necessarily bring the variate closer to normality, though in some cases they will do so 
—we have, for instance, seen that +/y? tends to normality quicker than y? (12.7). The 
following values (Bartlett 1936d) illustrate the way in which the square-root transformation 
stabilises the variance of a Poisson distribution :— 


Mean on Variance of Poisson Variance of Poisson 
. ; Variate +/2z. Variate +/(z + 4). 
0-0 0-000 0-000 
0-5 0-310 0-102 
1-0 0-402 0-160 
2-9 0-390 0-214 
a 0-340 0-232 

| 4-0 0-306 0-240 
6-0 0-276 0-245 
9-0 0-263 0-247 
12-0 0-259 0-248 
15-0 0-256 0-248 


The term } in the third column was added by Bartlett on the analogy of a continuity 
correction. For m > 3 the variance is evidently quite stable. 


23.35. If now, having stabilised the variance, we carry out an analysis in the ordinary 
way, our residual sums of squares divided by the appropriate degrees of freedom will con- 
tinue to be unbiassed estimates of the common variance v, even if there are differences 
between the means of the classes. Instead of assuming as part of the hypothesis that the 
different classes are distributed with the same variance, we have transformed the variate 
so that this shall be so, at least to a close approximation. Relying further on the result 
that the transformed variates approximate to normality, or that if they do not the differ- 
ence will not seriously vitiate the z-test, we may apply that test to the transformed data 
in the usual way. 


Example 23.7 (Bartlett, 1936d) 
Table 23.16 shows the number of wheat seeds out of 50 which failed to germinate in 
four repetitions of an experiment with different treatments. 


208 THE ANALYSIS OF VARIANCE 


TABLE 23.16 


Germination of Wheat Seeds 


Number of Treatment. 

Number of 
Experiment. 

2 3 | 4 5 6 

11 8 9 7 6 

10 3 7 9 3 

ll 2 8 10 7 

6 4 13 a 10 

38 17 | 37 33 26 


In point of fact, treatment 7 was a repetition of treatment 6, the others being different. 
The point of interest is whether the treatments exert any effect on germination. We shall 
not inquire into any differences between experiments (which appear to be negligible from 
the row totals) and shall accordingly consider this as a one-way classification into seven 
classes, four numbers to the class. 

The presumption is that in any given class the variation is of the binomial type. We 
might apply the sin7!+/z transformation, but will adopt instead an ad hoc square-root 
transformation obtained as follows :— 

We have 


v = np (1 — p). 
Suppose now that p = p, + 6 where 6 is small. Then 
v= 2 (po + 6 — py — 2p%8) 
=n { (1 — 2p») (b — Bo) + Be — PO} 
= np (1 — 2p) + np. 
If we now put 


gx yu +k +4) 


whee: — =F and «x is the observed frequency, then £ will tend to have constant 
— 2p, 
variance. 


In our example the total frequency is 216 out of 1400 seeds, so that we may take as 
an estimate of p, the ratio 216/1400 = 0-15. The transformed variate then becomes 


sa {np +34 2 


= 4/(np + 2), approximately. 


RANDOMISATION 209 


On this basis the transformed variate-values are— 


TABLE 23.17 


Transformed Variates of Table 23.16 


Number of Treatment. 

Number of T 

Experiment. des 
2, 3 4 5 6 7 
it 3°606 3°162 3-317 3-000 2-828 3-317 22-694 
2 3:464 2-236 3-000 3:317 2-236 3:606 21-021 
3 3-606 2-000 3-162 3:464 3-000 3-606 21-484 
4 2-828 | 2-449 3-873 | 3-000 | 3-464 | 3-464 | 20-810 
ToOTALs 13-504 9-847 13-352 12-781 11-528 13-993 86:009 
| 


The analysis of variance is— 


Sums of Squares. Gute Quotient 


Between treatments . ee | 3-486 6 0-581 
vesidirall ye) en ees | 4-316 21 0-206 
ee - 
Toras | 7-802 27 


The sum of squares is particularly easy to obtain, being the sum of the original variates 
plus twice the number of variate-values. 

The variance ratio, 2-8, is barely significant, being just beyond the 5-per-cent. point. 
There is little evidence that treatments are exerting any effect on germination, since a 
comparison of treatments 6 and 7 (which are the same) indicates that such “ significance ” 
as exists may be due to heterogeneity in the seed. 


Randomisation 

23.36. Consider a two-way classification of pg members, the observed value of the 
jth A-member of the kth B-class being x. Following the line already considered in 21.48, 
we will consider the z-distribution in ae population of values obtained by permuting the 
members in any A-class in all possible ways. There will thus be (q!)? possible values of 
z, all based on the observed values. We have already considered a case of this kind in 
dealing with the problem of m rankings (16.29) and we shall follow the same procedure 
in solving the more general problem. 


A.S.— VOL. I. P 


210 THE ANALYSIS OF VARIANCE 


Let the values be arrayed as 


Vy L120 : Lo 
x x ° . ° Ho 

aa a 2q : : A 2 (23.58) 
ve Xp2 chlo, 3 Ty; 


If Sp is the sum of squares between rows, So that between columns and S the total, we 
know that in the ordinary case considered earlier in the chapter, S, is distributed as vy? 
with g — 1 d.f., and S — Sp — Sg as vy? with (p — 1) (q — 1) df. It follows that 


= = W, say, : : : ; » (23.59) 


is distributed in the Type I form 
aR ce WHY! (ay Fe ee F . (23.60) 


It is easier to work with W than with z, but there is of course no difficulty in passing from 
one to the other. 

We proceed to find the first four moments of W in the population of (q !)? values obtained 
by permuting the rows of (23.58) in all possible ways. 


23.37. If in (23.58) we increase the members of any row by a constant a, it is easily 
seen that So and S — Sp remain unaffected, and hence so does W. Thus we may take 
the mean of each row to be zero and then Sp = 0. With this origin we have 


Dole GN 
Wee iG ' 5 ; : > (23,61) 
ays 
If now tJ 
q 
Ry, = Di (a 43) . ae : . (23.62) 
j=1 


and the k-statistics of the q values x,;,7 = 1... q, are written k,,, kj,., etc., and 


= Ne Res 2% = « ere 
ide 
we find 
Ww =; a G EK : : . (23.64) 
B(Ry)=09 . « «© «6 «9s. eS esieay 
E(RA)=(q—Vkiekws . . Q , : ae . (23.66) 
B (Ry) = ua a—* lgkigd-eoee OU Le Oe 


eye Us a Hy kip pe = = 2) gS) (23.68) 


q(q +1) it Fase ; 


RANDOMISATION eit 
Then, for the moments of U, 


H(U) =0 . ' ’ : ; : ; . (23.69) 
E(U?) =(q—1) a we. = @ . . .  % © e340) 


z (U%) = 6 (¢ — Pet kia kyo kya + ie ae <1 |e ee : , (2357 
i,k 


1 ie} Al D) 
+ Baas S” Kz Keys kyg + 72 (q¢ ~ 1) 3" kerg lena rg Hing. « (23.72) 


where 3%” denotes summation over values for which the subscripts are unequal and permu- 
tations are not allowed. 
Finally, for the moments of W we have 


IL 
ha . (23.73) 
iv 4 XD" kia ky, 
E = 2. a a . 
= 48 peers 8(q — 2) 2’ kis ky: 
Y= e- ) ane . (23.75 
NG Ee pq (q — 1)? (X kj2)3 ( ) 
E(w —W)t= 48 (2 Kyo Ryo)? 96 X” Kia Kia 


pt(g—1)? (£ ki)! piqq—1?q+)) (Z kan)! 
1152 ; 7a kyo ky kyo King 
piq—i* (Eka)! 
oe) Gay 2: kis kins 192 (q — 2) a hig keys kia 9 
pia tN) Q)(G—1* (Ska) > pt —V9q (Eke)t | A878) 


These formulae can be derived in the manner of 16.33, but reference may be made to 
Pitman (1938) for further details. 


23.38. We now consider how far the first four moments of W, as found above, agree 
with the first four moments of the distribution (23.60). The mean and variance of the 
latter are 


PS Pe 2) 
The means agree exactly. For the variances to agree we must have, from (23.74) and 
(23.77), 


: and ae 1) . : : (2o.0t) 


4 D" kinkn, _ 2 (p — 1) 


SCE 
pi(q—1) (2 ke)? p* (pq —p + 2) (23.78) 


2 Dis kyo kyo 


212 THE ANALYSIS OF VARIANCE 


we find that (23.78) is equivalent to 


= Doe 
K =F a Gy he ee 


p—A 


The ratio K may have any value from 0 to , the lower limit being approached when 


one of the second k-statistics k;. is much larger than the others, the upper limit when they 
are all equal. Hence all that can be said about the variance of W is that it is not greater 
2(p — 1) 
tan 8 (q — 1) 
Turning to the third and fourth moments, we note that in many cases where the varia- 
tion is not too skew the quantities k,,; and k;, will be negligible. A number of terms in 
(23.75) and (23.76) may thus be neglected, but even those that remain are fairly com- 
plicated, and it is difficult to say how far the distribution of W will approach the Type I 
distribution (23.60). In practice the values may be worked out and compared. If there 
is reasonable agreement, the z-distribution of the variance ratio will hold in the particular 
population which we are considering. 


and that it takes this value when the variance of each p-class is the same. 


23.39. <A better approach is to find the Type I distribution which has the same first 
two moments as W and to modify the z-test where necessary. It may be shown that when 
K is not too small the third and fourth moments of W and the fitted Type I distribution 
are in fairly good agreement, so that we may expect a good fit. 


: 1 : 
The Type I distribution with mean e and variance = ea has the mean and variance 
of W by definition. Its third moment is easily seen to be 
8K? p—2 
: : : . (23.81) 
3(q —1 2K 
p* g pit 


We have to see how far this differs from the actual third moment of W given by (23.75). 
Now 
32" Keg Kus kg = & Keig 2" Keys Ing — &" kein keys 
= 2 hig 2" keyg ky — (2 kip = kis ine 2 kis 
= 2 hig (32" heyy king — & kin) + Z kid, 


and hence 
a & a ee 
Since all the k’s here concerned are positive, 
E key 3 key > (2 HR)? 
and hence 
Eke? waa PSR (28.88) 


Hence, from (23.82) and (23.83), 


& Kosa Kens ke cag 
6 —_ 2! > 8K See ee re 
(E kn)? oe ) Zz} . (23.84) 


RANDOMISED BLOCKS 213 


Similarly, since 

& kis akin )# 

mae YL = (1 — KVP (1 — — iK — 7K? 

Bhat < Lemar) ~~ EM < 0-2) 9-48 
it appears that 

g 2 bea kia ley 
| (X ki)? 

On comparing (23.75) and (23.81), and assuming that the second term in the former may 
be neglected, we see that they differ by the factor whose limits we have found in (23.84) 
and (23.85), namely 


< K? ee ; : : . (23.85) 


i] ee HK 8) ok 
K and ql 


If K is not too small the limits are not very different from unity, and the third moments 
are accordingly in fairly good agreement. 

In the same way but with rather more complicated algebra it may be shown that the 
fourth moments are in fair agreement. 

When all the rows are rankings, the case reduces to that considered in 16.33 et seq., 
and we have already seen that the distribution of W is closely approximated by the Type I 
distribution in that case. 


] = 


23.40. Suppose, now, that we have p classes of objects, one of each class belonging 
to a second series of classes, gin number. As our hypothesis we will suppose that member- 
ship of the q-classes is independent of the variate-vaiues, so that we may suppose it to be 
a matter of chance how the values in any p-class are distributed among the q-classes. On 
this hypothesis the variance ratio will follow the z-form approximately (subject to the 
conditions we have discussed above) in the population consisting of the (q¢ !)” permutations 
of observed values ; and this will be so whether the parent is normal or not. 

By shaping the inference in this way, and making it conditional, we are thus able to 
apply the z-test even in cases of non-normality. The test of homogeneity still applies, but 
of course the inference is rather different from the usual type. This point has not, perhaps, 
been adequately emphasised in the past and there still seems to be confusion on the subject. 


Randomised Blocks 


23.41. The principle of testing in a conditional population has received its chief 
applications in a certain type of agricultural experiment (and analogous cases in other 
fields), known as a randomised block experiment. We are given p blocks of land and wish 
to test the existence of differential effects among q treatments, e.g. manurial treatments, 
of a crop to be grown on it. We divide each block into q plots and grow the crops on each 
of the pq plots. In any one block we apply a different treatment to each of the q plots ; 
and we allocate the treatments among the plots at random. 

This randomisation is an essential part of the process. If the treatments exert no 
effect the observed yields might have occurred in any order, and by making the inference 
in the proper way we are able to test in the z-distribution without assuming parent nor- 
mality or the non-existence of fertility differences between plots of the same block. If, 
of course, the parent is near to normality the test is strengthened. Had we not allocated 
the treatments at random the use of the z-distribution would not have been valid in the 
absence of normality (at least approximate) on the part of the parent. 


214 ‘THE ANALYSIS OF VARIANCE 


23.42. It is of some importance to make clear the exact hypothesis which is being 
tested in this approach, since misunderstandings on the point have led to some rather 
heated controversy. If the treatments are numbered 1 to qg, we consider the possible yield 
on the plot j, & if it received the /th treatment, say x, ).. In actual fact only one of these 
treatments was carried out ; the other values of x, ~ are hypothetical and are based on 
our conception of what would happen if the treatments were differently distributed. The 
totality of values 2, form our hypothetical population, We are supposing that the 
observed yields can be expressed as 


Lip qy = 4 + Ex Ms 


where a, is an effect differing from block to block but constant within blocks, and &,, 7) is the 
“individual ”’ plot effect which has a zero mean. The hypothesis we have considered in 
arriving at the validity of the z-test in conditional inferences is that every treatment affects 
every plot to the same extent, apart from the block effect a;. In short, we suppose that 
é.q is the same for all 1. This is the hypothesis usually tested in data from randomised 
ve 

Neyman (1935a) proposed an alternative hypothesis, viz. that the mean effects of 
treatments over all blocks were the same, on the ground that we are interested in average 
treatment effects when testing fertilisers, not the effect on particular plots. The hypothesis 
here is that x. = %,,, which is not the same as before; and it appears from Neyman’s 
analysis that the z-distribution under randomisation may not hold to such a satisfactory 
approximation as in the former case. Once again we have to stress the importance of 
gaining a clear idea of the hypothesis under test. 


Example 23.8 (Eden and Yates, 1933; Pitman, 1938) 


Eden and Yates considered some data, based on actual experience of heights of wheat 
shoots, comprising eight classes of four, equivalent to the following measurements :— 


Class 
1 2 3 4 5 6 7 8 
433 455 4874 4074 4524 2574 4344 4754 
429 4194 389 5744 4363 2634 5263 . 4734 
383 479 4634 4774 415 392 470 4233 
437 5044 4691 4524 418 426 532 4814 


The variances of the eight classes, in units of th, are then found to be 
7628; 15,702; 22,669; 59,732; 3,666 : 90,593 ; 26,297; 8672, 
The quantity K of equation (23.79) is then found to be 0:7577. The quantity 
(1) (g.— 1)G 
: is 0-8077. 
PY— p+ 2 
z-distribution will be approximately reproduced by the data under random permutations. 
This was confirmed by Eden and Yates in a sampling experiment on the data. 1000 
sets of permutations were taken and z calculated for each. Agreement with expectation 
was good. 


Thus (23.80) is approximately satisfied and we expect that the 


Example 25.9 (Friedman, 1937) 


A good example of data from populations which are probably far from normal is given 
in Table 23.18, showing the standard deviations of expenditures on various items for six 


RANDOMISED BLOCKS 215 


income-groups. The figures relate to families of wage-earners and lower salaried workers 
in Minneapolisand St. Paul, U.S.A., in 1935-6. 


TABLE 23.18 


Standard Deviations of Expenditure on Certain Items of Families in Specified Income Groups. 


(Figures in brackets are ranks.) 


Annual Family Income (dollars). 

Category of Expenditure. | 5 = 
750- 1000—- 1250- 1500- 1750- 2000— |2250—-2500 
Housing . . . . . | 100-3 (5)| 68-4(1) | 89-5 (3)| 77-9 (2) | 100-0 (4) | 108-2 (6) | 184-9 (7) 
Household operation ‘ 42-2 (1)| 44-3 (3) 60:9 (4) | 73-9 (6)| 43-9 (2)| 61-7 (5) | 102°3 (7) 
Food ~ . « «| 71:3 (1)| 81-9 (2) | 100-7 (7); 86-5 (8) | 100-3 (5)| 90-7 (4) | 100-6 (6) 
| Clothing . . «| 87-6 (1)| 60-0(3) | 57-0(2)| 60-8 (4)| 71-8(5)|] 83-0 (6)| 117-1 (7) 
Furnishings, ete. oe 58:3 (2)| 52-7(1) | 96-0 (6)| 60-4 (3) | 104-3 (7)) 89-8 (5)] 85-8 (4) 
Transportation 46-3 (1)| 82-2 (2) | 129-8 (3) | 181-0 (6) | 172-3 (5) | 164-8 (4) | 246-8 (7) 
. Recreation 19:0 (1)| 23-1 (2) | 38-7(8)| 45:8 (4)| 59-0(7)| 50-7 (5)| 55-2 (6) 
Personal care : 8-3 (1)| 8-4 (2) 9-2 (3)| 14:3 (6)} 10:6 (4)]} 15-8 (7)| 12:5 (5) 
| Medical care. . . . 20-1 (1); 33:5 (2) | 60-1 (4);} 69-3 (5) | 114:3(7)| 45-3 (3) | 101-6 (6) 
Education . .. . 3:2 (1)| 4:1 (2) 12-7 (4) | 18-9 (5) 8-9 (3) 41-5 (6)) 66-3 (7) 
Community welfare . | 4-1 (1)| 18-9 (5) 8-5 (2)| 12-9(3)| 25-3(7)| 19-9 (6)| 16-8 (4) 
Vocation . : 7-7 (1) | 11-2 (5) 10-4 (2); 10-9 (4)| 10-5 (3)) 14:0 (6)| 14-4 (7) 
| Gifts 5-3 (1) | 10-9 (2) 11-2 (3) | 25-3 (4)| 42-3 (5)| 48-8 (6)| 69-4 (7) 
Other . 6-0 (5)| 5:6 (4) | 22-2 (7) 2-5 (2) 6-2 (6) 1-0 (1) 4:0 (3) 


In brackets we show the ranks of the figure for different income-groups for each 
category of expenditure. We wish to know whether the standard deviations for each 
category differ significantly for the different income levels. Onthe hypothesis that they 
do not it is a matter of chance how the ranks fall. 

The sums of ranks in each column are :— 

23, 36, 53, 57, 70, 70, 83. 

128 
m? (n3 — n) 
n —=7 and S is the sum of squares of deviations of sums of ranks from the mean 


ag) = 56; we find that S = 2620 and W = 0-4774. We may test the significance 


The coefficient of concordance (vol. I, p. 411) is then W = , where m = 14, 


(vol. I, p. 419) by writing 


2 
ge | BD rte 4: 
VY, = (m — 1)”, = 761. 
The value of z is highly significant, and we conclude that standard deviation is related to 
size of income—the more money there is to spend, the more variable is the expenditure 
on particular items. 


216 THE ANALYSIS OF VARIANCE 
NOTES AND REFERENCES 


The idea of comparing variance between classes with the variance within classes in 
order to test homogeneity is found as early as Lexis (see footnote on page 119). Modern 
developments, and particularly the exact test of significance for normal parents, are due 
mainly to R. A. Fisher. Apart from papers by Irwin (1931 and 1934), connected accounts 
of the theory of variance analysis are hard to find, many points of theoretical interest being 
scattered among papers which are primarily practical. 

For the general theory and applications reference may be made to Fisher’s Statistical 
Methods (1925a, 1944) and Design of Experiments (1935c, 1942), to a useful introductory 
account by Goulden (1939), and to the writings of Yates, particularly his Design and Analysis 
of Factorial Experiments (19376). 

On the question of randomisation in preserving the z-distribution see Eden and Yates 
(1933), Welch (1937, 1938a), and Pitman (1938). References to work on ranking are given 
at the end of Chapter 16. 

For work on the distribution of the greatest of a set of variances see Fisher (1929a, 
1940a), Cochran (1941), Stevens (1939a), Hartley (1938), and Finney (1941a). For further 
work on the square-root and sin~! transformations see Cochran (19406), Beall (1942) and 
Curtiss (1943). 

The literature of this subject is now very large. Some further references are given 
at the end of the next chapter. 


EXERCISES 


23.1. Ifa (j=1... m) are a set of normal independent variates with variances 
1/w;, consider the transformation 
UE = -> Lag Xv; /w. jo 


where the l’s are defined by 
Ly = ‘ W;,/ 21) 


k=1.. 
Jeon /(Qe)(Qe)p tshiga 
n= ~ Sim //\ (Se) (Ya) ae 7 ea 
t=1 i=1 =i aacch 
‘ ja2, 3... 
ot) 


Show that the /’s are orthogonal and hence that 


nT nr 

2 > 2 
Uy = Wy Cy 

=I k=1 


is distributed as 4? with n degrees of freedom. Noting that u, = dD, We X;,/+/Sw is dis- 


tributed normally with unit variance independently of uw, ... u,, show that 


D>, ME &e — 8)? 
k=1 


is distributed as 7? with m — 1 degrees of freedom. 


EXERCISES 217 


Hence derive the z-test for the analysis of variance with unequal members in a one-way 
classification. 
(Irwin, 1942.) 


23.2. Verify the arithmetic in the analysis of variance of Example 23.5. 
23.3. Verify the arithmetic in the analysis of variance of Example 23.6. 


23.4. In a bivariate table with k rows (different rows corresponding to different 
values of the x-variate) write 


1 2. Sel 

ie "ee 9) 
als 2 

aie (Nz Sz); 


where o? is the variance of the y variate, s? the variance, and n, the frequency in the row 
with variate-value x Thus 


and the ratio on the right is the variance-ratio in a one-way classification with unequal 
numbers. ; 
Show that, for any form of population, 


EQ =a E(q)=N—k 


= dl | oe Wh 
varh = 2(k—1) + (B— 8) {25+ = } 


varg = 2(N —&) + (Bi. ~8)42-- +N — 24h 


cov (Iq) = (Bs — 8) 4k — 1 +2 -Z=1. 


xz Ny 

Hence, approximately, that 

h E at “var q cov (h, q) \ 
E(-) =——<1+ ~~ 

(i) E (q) B*(q) Eh) Eg) 
()" _ E?(h) {! varh 4 cov (h, q) + ora} 
q (7) E*(h) E(k) Eq) Eq) J 
In the case when all rows contain the same frequency 


and then | B(4)\=4—=5 {1 +53} 
var (o | = a 


Hence show that the mean and variance of the variance-ratio are, to this order, independent 
of the distribution of y, indicating that the z-test is not very sensitive to deviations from 
normality. 
(E. S. Pearson, 19316. It is rather remarkable that the correlation of h and q, far from 
disturbing the z-distribution, contributes to its stability.) 


CHAPTER 24 
THE ANALYSIS OF VARIANCE—(2) 


Estimation of Class-differences 

24.1. In the previous chapter we considered the analysis of variance mainly as the 
provider of tests of homogeneity. We have now to examine in more detail the problem of 
estimating class-effects, assuming that the homogeneity tests have shown them to exist. 
We discuss in the first instance the case in which there is only one member in each sub- 
class, and for the sake of simplicity confine ourselves to a two-way classification, though 
the theory is quite general. 

The fundamental hypothesis to be examined is that the data may be expressed in 
the form 

Si, = C, eee Peer aces) | 

where a; and b, represent class-effects and ¢ is a random normal variate with zero mean. 
Our analysis of variance will have shown whether this is an acceptable hypothesis, and 
our present problem is to estimate the unknown values of a’s and b’s from the observed 2’s. 


24.2. The joint probability of the ¢’s is 
cc exp { a 5 2 ten =e oye \ dt... dln 6 + (24.2) 


yea 


where v is the variance of ¢, and in conformity with the notation used in the previous chapter 
we have p A-classes and q B-classes. The maximum likelihood estimates of the a’s and 
b’s are then those which minimise the sum in curly brackets in (24.2), that is to say, the 
least-squares solution of the equations (24.1). In the usual way we find 
qd 
(%j4 — a; — b,) = 0, 9 = Tee 
ig ‘ : » (24.3) 


Dd, (ie — 4; — b,) = 0, ee | 


j=1 


which reduce to 

v5, ae a; —— b. = 0 x 

o, —@,—h = a ee es 
Summing the first equation over j, dividing by p, and subtracting from the first, we obtain 


Ly, — 2, =a; — a, 5 ge ree . . ~ (24.5) 
and. similarly 

Ly, — XL, = db, —O, bs Ge : ; . (24.6) 
In (24.5) there are p equations, but if we sum them all we reach the identity 0 = 0, so that 
only p — 1 are independent. There is thus an element of indeterminacy which we may 
remove by supposing that a. = 0. Similarly we may take b, = 0, and then we have 


a; =%;, — 2, ge |, Sees : ‘ . (24.7) 


b, = 2 Ey — zx, k = lf Cat q. . e e . (24.8) 
218 


ESTIMATION OF CLASS-DIFFERENCES 219 


Our estimate of any class-effect is equal to the deviation of the mean in that class from 
the total mean. 


24.3. Evidently similar equations arise in the general n-way classification. We shall 
see below that they break down for unequal numbers in subclasses, except in a special 
case when the numbers are proportionate. 

The assumption that a; and b, have zero means is not, in effect, a restriction on gener- 
ality but only a convention. If we prefer it, we may consider the slightly more general 
hypothesis that ¢ has a mean m, in which case we have to minimise 


Da (Xx, = a; = b, = m)?, we c . A 5 (24.9) 


This will be found to lead back to equations (24.7) and (24.8), with the additional equation 
for estimating m 


m=. : : . : - . (24.10) 


Or again, if we prefer to absorb m into the a-effects we have 


a; = %;, 
en er } . : : . (24.11) 


the mean of a; in this case not vanishing. Which form we use is a matter of convenience. 


24.4. It is important to notice that the equations of estimation which we have just 
reached give each a; and b, independently of values in other classes. We obtain the same 
equation for a; whether we happen to be estimating other a’s and b’s or not. This property, 
as we shall see shortly, fails to hold if the numbers in subclasses are disproportionate. 
The situation is similar to that in which we can determine the constants in a regression 
line independently of the others if orthogonal polynomials are used, in that each constant 
is given by a separate equation not containing any of the others. Data of this kind are 
called orthogonal. 

The direct comparison of class-means which is possible with orthogonal data can be 
seen, from general considerations, to be legitimate. In comparing z,, — x,, with x, — 2, 
the estimates of the effects in the ith and jth A-classes, we are in each case averaging over 
q B-classes with one member in each. The B-classes, therefore, affect each mean to the 
same extent and do not affect their difference. If there are more members in some sub- 
classes than in others, the means are unequally weighted with different 5-effects and 
the comparison is invalidated. 


24.5. Regarding x; — x, as the estimate of a; and x, — x, as the estimate of 6,, 
we see that the familiar equation 


Z(t, —2,)J?=X (a, —2,)?+ 2 (ry, — 2)? +2 (ty, — 4, — 2, + u,)? (24.12) 


el] 


can be regarded as an analysis of the sum of squares on the left, which has pg — | degrees 
of freedom, into terms in which there is one degree of freedom for every fitted constant and 
a residual with (7 — 1)(q¢ — 1) degrees of freedom. Every constant fitted reduces the 


pumber of degrees of freedom in the residual by unity. 


220 THE ANALYSIS OF VARIANCE 


Unequal Numbers in Subclasses 


24.6. For a one-way classification we have already considered (23.7 and 23.8) the 
case where the numbers in subclasses are unequal. It was seen that the total sum of squares 
could be expressed as a sum between classes and a residual which were independently 
distributed and whose ratio therefore provided a homogeneity test in the usual way. 

When we try to extend this result to two-way or generally to n-way classifications, 
we begin to run into difficulties. We can still find, as shown below, an estimator of v based 
on p — 1 degrees of freedom and differences between A-classes, and one with q — 1 df. 
based on differences between B-classes; but these are no longer independent, and conse- 
quently we cannot subtract their sum from the total sum of squares in order to obtain 
a residual or an interaction term which also provides an unbiassed estimator. 

On the other hand, there is now available an independent estimator of v which did 
not appear in the orthogonal case where only one member was included in each subclass. 
In fact, since there are several members in any given subclass, we can find an estimator 
of v based on those members alone ; and we may pool all such to form an estimator with 
N — pq degrees of freedom, where there are pq subclasses. This estimator will be inde- 
pendent of subclass means and any estimators based on them, and hence provides 
a “residual ’’ such as we require to carry out homogeneity tests. 


24.7. Suppose we have a two-way classification into p A-classes and q B-classes, and 
let the number of members in the subclass A; B, be n;,. Let %,, be the mean of these 
members. We may array the means as 


Ty X12 . X19 
Loy Leo . . . Xoq (24 13) 
V1 po Vg 


Now we may, in the first instance, test for homogeneity by ignoring the differences 
between A- and B-classification and merely regarding the data as a one-way classification 
with pq classes. The usual test for homogeneity is then applicable. The sum of squares 
between means of classes will have pg — 1 degrees of freedom, the total N — 1 d.f., and 
the residual N — 1 —(pq —1) = N — pq df. This residual, in fact, is the one men- 
tioned in the previous section, and is based on the pooled sums of squares within the pg 
classes. The other term based on pg — 1 degrees of freedom is the sum 


2 Nyy, (Ej, — &,,)? 
and is derivable from the array (24.13). 


24.8. To test the effect of A-classification separately we proceed as follows :— _ 
Any #,, is the mean of nj, values and, on the usual hypothesis as to normality, will 


; v ; 
have variance ae If x,, is the mean of all N values we have 
jl 


1 
ee yD, Mh | . ° . » (24.14) 


UNEQUAL NUMBERS IN SUBCLASSES 221 


Let the marginal unweighted means in (24.13) be £;,, £4, 80 that 


2 Pees 
&j, = — L Xj 

ae a ee 
Ey =~ Zp 

Pi 


On the hypothesis of homogeneity the variance of #, is given by 


v 1 1 ] v 
= ma tag tem) =a - 4  « (24.16) 
q* G Nye Njq N; Z 
where 
1 1 1 
Be 5 ey (eee 5 : 5 ag (ébily 
N; Qe (=) 


Now let us regard the means £;, as the means in p classes whose numbers are N pp as 
is legitimate from (24.16). Then writing 


EN; & 
= - . ° e ° e ° 24.18 
= DN Seen 


we have for an unbiassed estimator of v 


eee N (aj. 0)" 
= We (z%;, —¢c)? = = 
This estimator has p — 1 degrees of freedom and is distributed as y?. (This follows from 
the one-way case except that N; may not be integral; and its general truth may be estab- 
lished as in Exercise 23.1.) It is independent of the residual with N — pq d.f., and hence 
the A-effects may be tested separately. 
Similarly, if 


= {= (Nj 3.) — c? zy}. . (24.19) 


fi = ==(<), re ere 
j 
an unbiassed estimator of v is given by 
pie { 2» (M;, «,,) — d? =m, : : : . (24,21) 
q—1l«x k 


where 
2 Meee 
Lees > A 4 5 5 (OL Oe 
St,” ( ) 
k 


and this also may be compared with the independent estimator based on N — pq df. 


Example 24.1 (data from Brandt (1933) considered by Yates (1934a) ) 


Table 24.1 shows, for a number of breeds of pig, the numbers of each breed, 
divided into male and female, and the total logarithm of the percentage bacon yielded by 
the slaughtered carcases. The logarithm has been taken so as to normalise the variate. 


222 THE ANALYSIS OF VARIANCE 


TABLE 24.1 


Numbers and Logarithm of Percentage Bacon in Breeds of Pigs. 


Female. Male. | 
Breed. 
Nubek ee Percent. bee Log. Percent. 

acon. Bacon. 
Hampshire . . . 33 66-55 89 181-04 
Duroc Jersey . . 51 98-69 141 281-43 
Tamworth .. . 13 25:90 i7/ 34:20 
Yorkshire...) = 4 7-62 9 17-58 
Berkshire +k of 8 14-64 4 8-20 
Poland China . . 15 28-11 32 64-42 
Chester White .. 35 66-90 47 90-52 
@thers 2 5. «% 12 23-32 23 46-70 
TOTALS . . 171 331-73 362 724-09 


The total sum of squares, which is not obtainable from this table as it stands, we quote 
as 13-0142. 
The class-means and reciprocals of class-frequencies are given in Table 24.2. 


TABLE 24.2 
Class-Means and Reciprocals of Class-Frequencies for the Data of Table 24.1. 


Female. Male. : 
Unweighted 
Breed. Mean of 
Mean. 1/njx Mean. 1/njx Hse 

Hampshire. . . . 2:016,667 0:030,30 2-034,158 0-011,24 2-025,412 

Duroc Jersey . . . 1-935,099 0-019,61 1-995,958 0-007,09 1-965,528 

Tamworth . .. . 1-992,307 0-076,92 2-011,765 0-058,82 2-002,036 

Yorkshire . . . .{| 41-905,000 0-250,00 1-953,333 Oa IE 1a 1-929,167 

iSenkshire! ee. 1-830,000 0-125,00 2-050,000 0-250,00 1-940,000 

Poland China . . . | 1-874,000 0-066,67 2-013,125 0-031,25 1-943,562 

Chester White . . . 1-911,429 0-028,57 1-925,958 0-021,28 1-918,694 

(UWS G4 6 5 « 1-943,333 0-083,33 2-030,434 0:043,48 1-986,884 
Unweighted Mean of (Total) (Total) 

Means . Ke foes 1-925,979 0-680,40 2-001,841 0:534,27 1-963,910 


UNEQUAL NUMBERS IN SUBCLASSES 223 


Taking first the classification into male and female (q¢ = 8), we find, from the relations 


lee 

N; q* k Nyk 

ie 94-0625 
0-680,40 

No einen) 
0-534,27 


Then, from (24.18) 


o = Tia, _ (94-0623 x 1-925,979) + (119-7896 x 2-001,841) 
ZN; 94-0623 + 119-7896 


= 1-968,474. 


Thus our estimate of v, with one degree of freedom 


= 5 (N, a?) — 02 (2 N,) 


= 0-3022. 
Similarly for the eight breed-classes we find an estimate of v with seven degrees of 
0-6056 
freedom to be — = 0-0865. 


Considering the 16 subclasses as a one-way classification, we find the following 
preliminary analysis (the arithmetical details of which we omit) :— 


TABLE 24.3 
Analysis of Variance of Data in Table 24.1. 


Sum of Squares. d.f. Quotient. 
Between classes Ae re Career ii: 1-2715 15 0:0848 
VESICLUIGIMEN EMIS, 6) Sk cee 11:7427 517 0:0227 
ERGWAES ete 4 se | 13-0142 532 


j 


The variance ratio here gives a value of z equal to 0-659, which is significant. Thus the 
data are not homogeneous. 

We now require to decide whether the departure from homogeneity is due to either 
breed or sex or to a combination of the two. For sex-differences we have found an estimate 
of v equal to 0-3032 with one d.f. Comparing this with the independent residual from 
Table 24.3 of 0-0227 with 517 d.f., we find that the effect of sex is significant. Similarly, 
for breed, the estimate of v is 0-0865 for 7 d.f., which again is significant. We conclude 
that both breed and sex influence the departure from homogeneity. 


224 THE ANALYSIS OF VARIANCE 


It is particularly important to note that since the estimates between breeds and between 
sex are dependent, we cannot analyse the variance as follows :— 


TABLE 24.4 
Incorrect Form of Analysis of Variance of Data of Table 24.1. 


Sum of Squares. Cate Quotient. 


Between sexes . ... . 0:3032 1 0-3032 
Between breeds . . .. . 0:6056 7, 0-0865 
Obie 5 on 0:3627 Te . 0-0518 
Peetohor yn kh UC 11-7427 517 0-0227 
M 
TOTALS) 2.) | se oun ate 13-0142 532 
} 


In fact the term shown as “ interaction ’’, calculated so as to make the sums of squares 
and degrees of freedom additive in the usual way, is not an unbiassed estimate of v. This 
is a critical point of difference between the orthogonal and the non-orthogonal case. 


24.9. Suppose that the homogeneity test has shown the existence of significant 
class-effects. As before, we turn to consider the hypothesis that the data can be expressed 
as the sum of A- and B-effects separately with a random normal residual. Let 2, be 
the typical member of the (7, £)th subclass, / varying from 1 to n;,. Our hypothesis is then 


* Lit — a; = b;, a. Ci . . . ° e (24.23) 
where ¢ is normal with variance v. For convenience we will regard the mean of 5 as absorbed 
in the coefficients a, so that we may take ¢ to have zero mean. 

The usual process of estimation of the a’s and 6’s leads to the minimisation of the 
sum over all N values of 
& (jy — a; — b,)?. 
Differentiating with respect to a; and b,, we find the series of equations 
XL" (X49 — a; — by) = 0, [are 
if ; Ree (7!) 
PIPE CaM Piet )) =) = ee, 
j 
where 2” denotes summation over the ,, values in a subclass. These equations reduce to 
Xi 15x, A; ++ 5 Pi = 2 Nix Xn 
. ree ran . (24.25) 
x Nyx a; + 2 Nx b, = 2X NK Vik 
j j j 
Writing N;, for X'ny, and N ,, for 2 nj,, we have 
k j 
IN ara ee id, +. p e . (24.26) 
2195 TN i Oy = 2 Me Ei k= I. e 8 q- A . (24.27) 


j 
To which we may add 
Zbh=0. . 3 3 WR eee 
k 


UNEQUAL NUMBERS IN SUBCLASSES 225 


Had we chosen to absorb the mean of ¢ into the 6’s, this last equation would be replaced 
by La, = 0. 

D 

When all the n’s are equal these equations reduce to the orthogonal case, and each 


a- or b-coefficient can be independently estimated. In the contrary case the equations 
have to be solved as they stand. 


Example 24.2 


Returning to the data of Table 24.1, we find for equations (24.26) and (24.27) the 
following, the values of the constants required being obtainable from the body or marginal 
sums of the table itself :— 


7 lee; + 336, + 516, + 136, + 4b,+ 8b; + 15d, + 356, + 126, = 331-73 
362a, + 896, + 1416, + 17b, + 9b, + 46; + 326, + 47b, + 236, = 724-09 

38a, + 89a, + 1220, = 247-59 
5la, + I4la, + 1926, = 380:12 
l3a, + Il7Ta.¢ + 306, = 60°10 
4a,-+ 9a, + 136, = 25-20 
8a, -+ 4a, 4+. 120, = 22-84 
l5da, + 32a, + 47b, = 92:53 
35a, + 47a, + 82b, — 157-42 
12a, -+ 23a, += 35h, — are? 


To which we may add a, +a, = 0. 
The solutions are 
A100 020.0078 
b, = 2-017,259; b, = 1-967,367; b, = 1-999,799; 6, = 1-928,267; 
b, = 1-912,169; b, = 1:959,136; 6, = 1-915,877; b, = 1-992,241. 
These give us the “best” estimates of the mean effects of sex and breed on the 
hypothesis expressed by (24.23). 
The mean of the b’s is 1-961,514 which may be taken as an estimate of the mean of ¢, 
the b-effects then being the differences of the above b-values from this mean. 


24.10. Let us now consider the analysis of variance in the non-orthogonal case, 
when constants have been fitted by least squares in the above-mentioned way. 

To make the discussion clearer we will regard the estimation as relating to p constants 
a;, related by &'(a;) = 0, q constants 6,,, related by 2 (b,) = 0, and the mean m. There 
are thus p + q — 1 independent constants which, in effect, provide estimates of the means 
of subclasses. Whatever these means really are, the residual quotient based on N — pq 
degrees of freedom gives an unbiassed estimator of v, the common variance. We have 
now to analyse the remaining sum of squares based on pg — 1 df. 

If the true (population) values of the constants are denoted by «;, 6, and uw, the sum 


& (Xjnq — % — Be — f)? 
is distributed as vy? with N degrees of freedom. Developing yet another variation on 
a familiar theme, we show that the corresponding quantity 
S (a, — a, — 6, — m)* = 2 (ay, — a, — B, — #)*? — & (a; — x)? 
(Xpra j ke j j Eee Aya Zam — p)* « (24.28) 
is distributed as vy? with N —(p+q-— 1) df. 


A Sig AEs Q 


226 THE ANALYSIS OF VARIANCE 


In fact, equations (24.26) and (24.27) show that the estimators a, b (and in our present 
case m also) are linear in the variables . We can then find p + g — 1 orthogonal normal 
variables in terms of which they can be expressed. Their sum of squares will be distributed 
as vy? with p + q — 1 degrees of freedom (not some multiple of 7? because the mean value 
must be p + q — 1 in virtue of 18.17). Thus the remaining term 2 (xj, — a; — b, — m)? 
is distributed as vy? with N — (p + q — 1) degrees of freedom, independently of the portion 
due to the constants a, b and m. 

Furthermore, the actual reduction in sums of squares, equivalent to the sum of the 
last three terms in (24.29), may be easily determined. Precisely as in the similar problem 
of evaluating residuals in a regression equation, we have 


XY (2j—, — a; — b, —m)?* = Lahy — La; 2X kyqg — Xb, A yg — MZ ey . (24.30) 
i” byl k i,t jk 


where, of course, summation takes place over all values. 


24.11. The total sum of squares is already calculated about the estimated mean 
m, so that the reduction for the term 2 m? = N 2’ has already been taken into account. 
The total sum is then distributed as vy? with N — 1 d-f., as we already know. We know 
further that we can split off the independent residual sum based on N — pq degrees of 
freedom. This leaves us with a sum based on pg — 1d. From the previous section it 
follows that we can analyse this sum into two parts: (a) the sum of squares due to fitting 
the constants a; and 6,, accounting for p + q — 2 d.f., and (6) the remainder based on 
pq —1—(p+¢q— 2) =(p — 1) (¢ —1) df. This remainder is independent of the sum 
of squares due to fitting constants and provides an unbiassed estimator of v. If the ratio, 
as compared with the residual based on VN — pq .f., is significant, the hypothesis of additive 
effects breaks down. In short, we may regard this quantity as an interaction term. 


24.12. One important point to notice in this connection is that the interaction term 
depends on whether p + q — 2 or fewer constants are fitted. In the orthogonal case we 
can determine an interaction term once and for all, however things stand in regard to the 
estimation of inter-class effects; but for non-orthogonal data the number of class-effects 
estimated affects the interaction term, and if necessary a new significance test has to be 
applied if further estimates are calculated. The situation is similar to the testing of 
regression coefficients when orthogonal polynomials are not employed. 


Example 24.3 


Returning again to the data discussed in Examples 24.1 and 24.2, let us regard the 
means in all 16 subclasses as simultaneously under estimate. For the reduction in sum 
of squares due to the constants we find, using the values of a and } found in Example 24,2,— 


0-026,507 (— 331-73 + 724-09) + (2-017,259 x 247-59) + (1:967,367 x 380-12)... 


(1055-82)? 
= Sot ise 
533 Oa 


Here, for instance, the sum Yaj is given by multiplying a, by the term 2 x,;, already 
k 


found. The last term removes the effect of including the mean among the 0’s. 


UNEQUAL NUMBERS IN SUBCLASSES 227 


The sum of squares between classes was found in Example 24.1 to be 1-2715, based 
on 15 d.f. We then have 


Sum of Squares. d.f. Quotient. 
Sex and breed (estimation of constants) 1:0415 8 0-1302 
MperaCtIOn: < . =. 1. « wee 0-2300 7 0-0329 
Between classes . . . . . 1:2715 15 


Comparing the interaction term 0-0329 (7 d.f.) with the residual 0-0229 (517 d.f.) we see 
that it is not significant. 

If we neglect sex and consider breed alone, we have only to estimate eight constants 
b, . . . bs subject to Y(b) = 0. The sum of squares for breed alone is given by 

1 
22 

Similarly the sum of squares for sex alone will be found to be 0-4224. We have the 

following analysis :— 


1 1 
247-59)? + —— (380-12)? +... . — =. (1055-82)? = 0-7253. 
( aap | Veale 599 | ) 


TABLE 24.5 
Further Analysis of Variance of Data of Table 24.1. 


Sum of Squares. a Quotient. 


Test for Sex 


Between breed (estimation of constants) 
Sex Beers. Pan oe ea gets 


Sex and breed . 


Test for Breed 
Between sex (estimation of constants) . 
Breed ae oO Uo 


Sex and breed . 
Interaction 


Between classes . . .. . 1:2715 15 


Here, for instance, if we test for sex there are seven independent constants for breed 
and one for sex, the latter being the only one that interests us; and similarly for breed. 
On comparison with the residual 0-0227 both sex and breed are found to be significant. 


24.13. The reader may perhaps find the various tests of Examples 24.1 and 24.3 
confusing, and we accordingly summarise our results for the case of unequal numbers in 


subclasses. 
In every case, except where each subclass contains not more than one member, an 


estimate of the common variance v may be obtained, with N — pg d.f., by pooling the 
sums of squares within the pq subclasses. Call this 2. 


228 THE ANALYSIS OF VARIANCE 


Homogeneity may then be tested (a) by considering the pq classes as a single one-way 
classification and comparing the quotient between means with v,, or (b) by calculating 
for either classification separately the estimates based on (24.19) and comparing them with 7. 

If homogeneity is rejected in favour of the additive effect of classes expressed by the 
usual hypothesis, the sum of squares between all classes based on pg — 1 d.f. may be split 
into independent sums related to the fitting of the constants and to an interaction term. 
The latter can be compared with v, to test for interaction. If this is not significant, alter- 
native tests for effects between A- and between B-classes may be derived by testing the 
sum of squares attributable to the fitting of the respective constants against v,. These 
tests are, in effect, tests of one class neglecting the effect of the other, and may not be 
accurate if the latter effect is not negligible. It is probably better to fit constants to both 
classes simultaneously in the first instance. 


Proportionate Frequencies 


24.14. We have previously spoken of non-orthogonal data as meaning any classi- 
fication with unequal frequencies in the subclasses, but there is one other case of unequal 
frequencies for which orthogonality exists, namely the one in which frequencies are pro- 
portionate, i.e. there are marginal frequencies 1;, m,, such that 

Nj, = 1 my. : ' : 2 . (24,31) 


Here the means of A-classes are estimates of the individual corresponding a’s (though it 
must not be overlooked that they are based on different numbers of members in margins), 
and the sum of squares between A-means may be computed in the usual manner appro- 
priate to a one-way classification with unequal numbers. Similarly for B. The interactions 
may be estimated by subtracting the A- and B-sums from the sum of squares between 
classes. We leave it to the reader to verify these statements. 


Special case of 2X 2. . . Classification 


24.15. The foregoing analysis can be extended to the n-way classification, but in 
the general case the solution of the equations becomes rather complex and the arithmetic 
a considerable nuisance. Where, however, the classifications are simple dichotomies the 
problem simplifies to a great extent. For instance, in equations (24.27), if there are only 
two values of a;, which we may take to be + a and —a, we have 

Ny, Oy =X Nyy Epp — My, A + Ny, « 


a 
We have selected the a’s so that X (a) = 0, which implies that the mean m is amalgamated 
with the 6’s. Substituting for the 6’s in (24.26), we find 


Ny — No = Nik = 

a i — = Ly, ij, — 2 — 2 0; Lip 
k IN ye k k Nx 

which reduces to 


N11 Mia Nar Noe N11 N12 (= = > > 
(a meee er eee Pa 2 1) ee ct) vy ie 
Thus a is the weighted mean of the differences of corresponding B-class means and may 
be determined direct. So generally for a 2 x 2 x 2... classification. The differences 
may be tested for homogeneity by the z-test, which in this case reduces to the ¢-test. 


Nay Noe 


24.16. In view of the relative complexity of the non-orthogonal case, it is natural 
to wonder whether any serious error would be committed if we regarded the p x q table 
of array means as an ordinary two-way table with one member in each class and analysed 


THE MISSING PLOT TECHNIQUE 229 


the variance accordingly. Evidently such a procedure sacrifices a lot of information about 
variation in subclasses, but that is not the point. Is the analysis valid ? 

The hypothesis on which the analysis is based is equality of variance in subclasses. 
If the numbers in subclasses are very unequal the means based on them will have very 
unequal variances, and we expect that the analysis may be misleading. If, however, the 
numbers are close to equality the analysis will probably be approximately correct. 


Example 24.4 


Reverting once again to the data considered in earlier examples, we have the following 
analysis for the variance of the 2 x 8 table of class-means :— 


Sum of Squares. d.f. Quotient. 
Between sex . . . .. . 0:3032 1 03032 
Between breed . . .. . 0:2635 7 0:-0376 
Residal . 5. . os. «. « 0-2387 7 0:0341 
TEOTATS© cee met 0-8054 15 


The sum of squares between sex is the same as before, as it must be for a dichotomy, 
but the effect of breed is seriously underestimated and would not be judged significant by 
comparison with the interaction term, which is our residual. The numbers in the breed- 
classes are, in fact, too different to justify the approximation. 


The Missing Plot Technique 


24.17. The simplicity of the analysis of variance in the orthogonal case and the 
economy imported by keeping the number of values as low as possible often leads to the 
carrying out of experiments with only one member in each subclass. But this has a certain 
practical danger in that the value in a subclass may be lost through circumstances beyond 
the experimenter’s control. For instance, an animal may die in the course of an experiment, 
or a crop on a particular plot may be ruined by pest; or sometimes a record may actually 
be lost after measurements have been carried out. In such cases we may estimate the 
missing values and perform a variance-analysis in the following way. 


24.18. Consider in the first place a p x q classification with certain missing values, 
ry in number. We assume as usual that the variate-values are expressible in the form 
Lik aS a; -- by, -'- Cin a m, * . e e e (24.33) 
and we know that the “best” estimators of the constants are 


m=z. 
a;=X;,— x, >. ; . , : . (24.34) 


by = 2% yy — %,, 
The quantities on the right are, however, unknown to us because of the missing values, 
Suppose that we estimate the constants by minimising 
2 (2, — 0; — 6, —m)? . : : : . (24.35) 
where the summation 5%’ takes place over known values. Our estimators are then deter- 
minate and may be written a;, b, and m’. 


230 THE ANALYSIS OF VARIANCE 


We will now estimate the missing value on the plot (j, k) by the equation 
Xp =a, +b, tm.  . ee. eee eee 
We have 
E (ay, — a; — b, — m)? = 2” (aj, — ay — b, — m)*? + = (Xin — a; — b, — m)*?. (24.37) 


Let us now consider this as a function to be minimised, involving the unknowns a, b, m 
and r further unknowns X,,. The equations giving the latter will be obtained by differ- 
entiating (24.37) with respect to each X,,, and in fact are typified by 


Xp, =a; +6, +m’, 


J 
that is to say, by (24.36). The other constants are given by such equations as 


d” (tj, — a; —b, —m') +20 (Xp, —a; -—b —m')=0. . . (24.38) 


The second term vanishes, and hence we obtain the same minimal values for a;, b, and 
m’ as by minimising (24.35) by itself. Furthermore, the equations of estimation (24.38) 
may be written 

ik — 4 — b, — m’) = 0, . : : . (24.39) 
where the summation takes place over all values, those of the observed x’s where known 
and over the estimated X’s where values are missing. 

It follows that if we write X,,, for the r missing values, ascertain the residual sum of 
squares, which will be a function of observations and these r unknowns, and minimise 
it for variation in these unknowns, we shall obtain equations providing estimates of the 
unknowns equivalent to (24.36). The following example illustrates the method. 


Example 24.5 (Yates, 1933b) 


The following table shows the measurements of intensity of infection of certain potato 
tubers under eight manurial treatments in ten blocks. 


TABLE 24.6 
Intensity of Infection of Potato Tubers. 
Blocks 
Treat- 
hoata. 1 2 3 4 5 6 7 8 9 10 
1 3°55 | 2-29 b 2:00 | 3:34] 3-83 3°86 3:50 | 2:23] 2-91| 27-51 +56 
2 2:30} 4:03] 2-54) 2-82] 3-29] 2-93 i 2-55 | 2-20| 2-30] 24:96 +f 
3 3:96] 3:62] 3-46] 2-50} 2-94] 3-70 3°82 2:54 | 3:18) 3-69} 33-41 
4 2:99; 3-99| 2-90| 3-97} 4-49] 4-70 3°86 h 3°50] 3:59) 33-99 +h 
5 a 3:°07| 3-49] 1:07] 3:99] 3-48 3°80 3:68 3:24] 2-70] 28:52 +a 
6 2:36 | 3-47] 2:64) 3-17] 3:26] 3-28 g t 3°07 | 3:12) 2437 +9 +74 
u 2-16 | 2-34] 1:96] 2-60] 3-77 d 3-20 3:47 2-67] 3°33] 25-50 + d 
8 3:16 | 2°52] 2-39] 3-68 c € 3°85 3-36 | 2-50] 4-13} 25:59 +c+e 
Torats | 20-48 | 25-33 | 19-38 | 21-81 | 25-08 | 21:92 | 22:39 | 19-10 | 22-59 | 25-77 | 223-85 4+ a 
+a +b Core elec ns +b+ce+d+e 
+f+gth+4 


THE MISSING PLOT TECHNIQUE 231 


There are nine missing values in this table, indicated by the lettersa ... 7%. Omitting 
purely numerical terms, which are irrelevant for the purposes of minimisation, we have 
for the total sum of squares, 


a?+ 6%+c?+..,4 72 — J, (223-85 +a+b+e4+...+4)?; 
for the sum of squares between blocks, 
4 { (20-48 + a)? + (19-38 + b)2? +... + (19-10 +2 +4)?} . 
— gy (223-85 fatbte+...+4)3; 
and for that. between treatments, 
Ps { (27-51 + 6)? + (24-96 +f)? +... + (25-59 +e +e)? } 
— oJ (223-85 +a+b+e+...+7)?. 
The residual sum of squares is the difference of the first and the sum of the second and 


third of these expressions. For minimisation we differentiate with respect to a, b,...% 
in turn. On some arithmetic simplification we find 


68a + 6+ e+ d+ e+ ft g+t A+ +4 = 209-11 
a+63b+ c+ d+ e+ f+ g+ h+ 4 =190-03 
a+ 6b6+638e¢+ d— Te+ f+ gt h+ ¢ =231-67 
a+ 6+ c+638d— 9+ ft gt ht 1¢4=199-35 
a+ b— Te— 9+ 68e+ f+ g+ h+ t= 200-07 
a+ 6+ e+ d+ e+638f— 99+ A+ 4¢4=199-73 
a+ 6+ e+ d+ e— 9f+639+ h-— Ti = 195-01 


a+ 6+ e+ d+ e+ f+ 94+ 63h — 91 = 239-07 
a+ b+ e+ d+ e+ f— Ig— 9h + 637 = 162-11 
This set of linear equations can, of course, be solved by routine methods, but also by iterative 
processes as follows :— 
The mean of existent values is 3-15. Assume this to be approximately the values of 
b,c... 4%. Then for a we have, from the first of the above equations— 


@ = a, {209-11 — (8 X 3-15) } = 2-92. 


63 
Taking this value of a and 3-15 forc,d .. . 7, we find for b from the second equation, 
b = J, {190-03 — (7 x 3-15) — 2-92} = 2-62. 
Similarly, from the third equation, 
c = 2, {231-67 + (2 x 3-15) — 2-92 — 2-62} = 3-69, 
and soon. On reaching 7 we recalculate a from the first equation, using the approximations 


to the values of the other constants already obtained ; and so on until our values do not 
alter. In this case only a second approximation is necessary, the values being— 


| | | 
a | b c d | e€ i g h 4 
First Approx. . . . {| 2:92 2-62 3°69 3-27 3°76 3°26 3-60 | 3-88 3°22 
Second Approx. s esse 2-58 3°73 3:33 3:76 3°32 3°61 3-89 3°22 
| 


These are our estimates of missing yields. The treatment means are found to be :— 
| 2 3 4 5 6 ul 8 
3-009 2-828 3:341 3°788 3°140 3:120 2-883 3-308 


232 THE ANALYSIS OF VARIANCE 


24.19. The question now arises how we may analyse the variance of data for which 
missing values have been estimated in this way. 

The original data provided a classification with unequal numbers in subclasses and 
can be analysed by the methods given earlier in the chapter; except that, since no sub- 
class contains more than one member, we cannot find a residual sum of squares within sub- 
classes based on N — pq df. (N — pq, in fact, is a negative number.) For instange, 
regarding the data as a one-way classification with pg — r classes, we shall have an analysis 


of this type :— 


Sums of squares d.f. 
Between classes* . . pt+q-—2 

‘ . (24.4 
Residual . ‘ . (p-I(q-)ND-r (24.9) 
Total . : : - pga—r—1 


The effect of the two classifications separately can be dealt with in the manner of 
Example 24.1. 


24.20. Two simplifications are possible. In the first place, since the minimisation 
of the residual is the same for the original data as for the data completed by estimates of 
missing values, we can use the latter to compute the residual precisely as for an orthogonal 
case, which simplifies the arithmetic. 

Secondly, it appears that to an adequate approximation we may substitute the esti- 
mated values for missing values and analyse the resulting material in the ordinary way 
as if it were orthogonal. If the proportion of missing values is high this approximation 
may perhaps break down, and in practice we should probably regard the experiment as 
ruined. More usually only a few records are missing, and the effect of replacing them by 
estimates is hardly likely to affect judgments of significance seriously. 


Example 24.6 


Continuing the analysis of the data of the previous example, we find, for the total sum 
of squares, 32-1012 with 70 d.f. The analysis of the completed data, that is to say the original 
data plus the estimates of missing values, is as follows :— 


Sum of Squares. dif | Quotient. 
Between blocks . . . . . 9-7176 9 1-0797 
| Between treatments. . . . 6:5812 u 0:9402 
; Residusl: (ee See eee 17-6902 54 0:3276 
TOTALS) 7 i 33-9890 | 70 


* It is assumed that no row or column in the two-way classification is entirely empty. If it were, 
we should have to ignore it and confine attention to the remaining arrays. 


RELATIONSHIP WITH REGRESSION ANALYSIS 233 


Treating the original data as a case of unequal class numbers we find :— 


Sum of Squares. . cts Quotient. 
Between blocks and treatments 14-4110 16 0-9007 
Residual 17-6902 54 0:3276 
TOTALS bs) ese : 32-1012 . 70 
For blocks only :— 
Sum of Squares. d.f. Quotient. 
Between blocks . ... . 8-5690 9 0-9521 
Iemaingder. a5, )s 08 - -« 5-8420 7 0:-8346 
Blocks and treatments . 14-4110 16 


For treatments only :— 


Sum of Squares. drt. Quotient. 
Between treatments. . . . 6:2648 a 0-:8950 
Remainder. , . « . ~« -« 8-1462 9 0:9051 
Blocks and treatments . 14-4110 16 


Whether we usc the analysis of completed data or the more exact form, we see that 
differences between blocks and between treatments are significant as judged by the residual 
variance. The two analyses are, in fact, not very different, and even with as many as nine 
missing values out of 80 we should not err by substituting estimated values and treating 
the data as orthogonal. 


Relationship with Regression Analysis 

24.21. The general n-way classifications to which variance-analysis may be applied 
are not necessarily determined by a measurable variate. As for contingency tables, rows 
or columns can be interchanged without affecting the analysis. We can, however, regard 
a multivariate frequency table as an n-way classification and apply variance-analysis to 
it; and just as regression and correlation analysis provide a refinement on contingency 
analysis because of the arrangement of the classes in order by reference to a variate, so we 
may to some extent refine the analysis of variance in such a case. 


24.22. Consider in the first instance a p x q table of frequencies in the form of a 
correlation table. We will suppose the A-classification to be according to the variate x 


234 THE ANALYSIS OF VARIANCE 


and the B-classification according to y. Let us now consider the hypothesis that the data 
emanate from a normal bivariate population with zero correlation (or, somewhat more 
generally, that for any given y the x’s are distributed normally with the same mean and 
variance). We can then regard the data as a one-way classification according to y with 
unequal frequencies and analyse the variance in the usual form :— 


Sum of Squares. d.f. Quotient. 

: N n? var x 

Between classes . . . > nj (@j — &)? q-1 = 

j=1 
: N (1 — 7?) var a 
(esiclv1 si) areas 2 (aig — Hj)? N—-@q =. oN Ge 
= 
AROMAT 5 4 «¢ N var x N-1 


Here #, is the mean of n, x-values in the jth y-class, # is the mean of all N values, z,; is the 
variate-value in the ith x-class and jth y-class, and there are q y-classes. The quotients 
are expressible in terms of the correlation ratio of x on y, viz. n,,, (ef. 14.23. vol. I, p. 351). 
Now, on our hypothesis, the sums of squares between classes and the residual are 
independently distributed in the Type III form, and hence the variance ratio 
ee aaa 


Cowen : : : 5 » (24,41 
q—11-—~7? ( ) 


can be tested in Fisher’s distribution with », =q —1, »,=N—gqg. This is the test we 
gave in 14.25 (vol. I, p. 353) and it is reached by an argument of essentially the same 
kind. ; 

24.23. Now suppose that our p x q table is normal but correlated ; or, somewhat 
more generally, that the values in arrays of constant y are normally distributed with the 
same variance but with means which vary linearly with y, say 


m; =m + by,. . foo. eee 
Then our data can be represented by the form 
Gy =m + by + ly, . , ‘ : . (24.43) 


where the ¢’s are distributed normally with zero mean and the same variance v. Apart 
from the constant m, the only unknown here is the constant b. Our least-squares estimates 
(measuring from the means of x and y) now lead to the familiar form for the regression 
coefficient 


ee te ok i0 

XY; 
where summation takes place over all values observed. This is, of course, equivalent to 
foc COMMlzEO . a (24,45) 


var y 


RELATIONSHIP WITH REGRESSION ANALYSIS 235 
Further, the reduction in sum of squares attributable to fitting the constant b is 
Nb cov (x, y) = slp ae oes Nrtvarz, : . (24.46) 
var y 


where r is the correlation coefficient of the sample. 
Our analysis of variance may then be written— 


TABLE 24.7 


Analysis of Variance of a Correlation Table 


Sum of Squares. d.f. Quotient. 
Regression constant 6 . . . . .. - Nr? var & 1 Nr? var x 
2 __ yd 
Between classes (after regression is eliminated) | N (7? — r?) var x q-—2 N 7 ; var x 
: 1 — 7? 
Residual . 2. 2. «© «© «© © + «© «© ©.| N(L — 9") vara N—-q NF var x 
| 
INOWAER a5 oo ee 6 6 RE oc N var x N—-1 


This analysis gives us a test of the significance of the correlation coefficient in samples 
from an uncorrelated population and also of linearity of regression. 
In fact, if the parent correlation is zero, the parent value of b is zero and the quotient 
due to 6 is independent of the sum of the other items in the analysis. Thus the ratio 
NY? Vor de pee. te 
N(l—r?)varzx 1—r° 
is distributed in Fisher’s form with », = 1, 7, = N — 2. This is equivalent to saying that 
r2(N — 2) 
NI = Se ee 
is distributed in ‘‘ Student’s ” form with N — 2 d.f., which brings us back by a different 
route to the test given in 14.15 (vol. I, p. 342). 


. (24.47) 


24.24. Secondly, if we assume that the parent correlation is not zero but the regres- 
sion is linear, the sum of squares between classes after regression is eliminated is independent 
of the residual in Table 24.7, and hence the ratio 
n? — 2 

G2) ee Ne 
1 — 7? — g—2 1 — 7? 
N-q 

is distributed in Fisher’s form with », =q — 2, »,=—N—gqg. This test (due to Fisher 
himself) gives a test of linearity of regression in the normal case. 

It should be noticed that this test is only approximate if the classification is one of 
a normal population with broad groupings. If correlation exists, the distribution of a 
bivariate normal sample in an array of finite width is not exactly normal, being the sum 


N var x 


. (24.49) 
N var x 


236 THE ANALYSIS OF VARIANCE 


of a number of normal distributions with slightly different means. Unless the grouping 
is very coarse, this is not likely to invalidate tests of significancé in practice. 


24.25. Consider now the general regression formula for p variates,— 
Ly = b, Xs 4 b; H3 + oe e a Us Ly: . . ° e (24.50) 


Pp 
If we assume that the residuals 2, — »'6, x; (say x) are distributed normally with 
j=2 
constant variance, our least-squares estimates of the regression coefficients are those given 
by the usual theory, and the fitting of (p — 1) constants reduces the sum of squares by 
N var x R?, where R is the multiple correlation coefficient (cf. 15.16, vol. I, p. 380). We 
then have the analysis— 


Sum of Squares. lat Quotient. 
R?2 
Between classes (regression constants) N var x R? p—l ros WN var x 
il = 2 
Inger 5 6 ¢ o 6 & Go 6 Cc N var x (1 — RB?) N—p Nee ee 
ANGE} 5 56 6 9 4 « N var x N—1 


If the regression is in fact linear of type (24.50), the residual quotient is independent of 
that due to fitting regression constants, and the hypothesis may be tested by means of 
the ratio 
R? N—p 
p—11—R3° 
which is distributed in Fisher’s form with 1, =p —1, », =N—vp. This brings us to 
the distribution of R* given in 15.20. 


(24.51) 


24.26. It is to be observed that in (24.50) we may choose the variates 7, ... Ly 
as we please. In particular, we can take them to be polynomials of a single variate. From 
this point of view the analysis of variance links up with the theory of regression analysis, 
given in Chapter 22. If the polynomials are orthogonal we can fit the constants b one 
at a time, the fitting of any constant leaving unchanged the previous determination of those 
of lower orders. The reduction in sum of squares for each constant can be separately 
ascertained and corresponds to the loss of a further degree of freedom ; and at any stage 
we may test the residual variance to see whether any particular term is worth while in the 
sense that it makes a significant contribution to the total variance. The exact test, of 
course, depends on the usual assumptions of normality. 


24.27. The reader is now in a position to see a number of statistical topics which 
on the surface appear to be distinct as parts of a single theory. Regression analysis, with 
its subsidiary of correlation analysis, proceeds by the successive fitting of constants by 
least-squares. For the normal case this is equivalent to estimation by maximum likelihood. 
Partial and multiple regression, together with curvilinear regression, can all be subsumed 


THE ANALYSIS OF COVARIANCE 237 


under this central idea. The fitting of each constant splits off a separate contribution to 
the total variance which, under certain hypotheses, is independent of the others. Variance- 
analysis proceeds in much the same way, but is more general in the sense that it can deal 
with the classification of values, however determined. Our various exact tests of signifi- 
cance of homogeneity in variance, of linearity of regression, of significance of correlations 
in uncorrelated material, of the difference of two means where variances are equal, of the 
correlation ratios, of the multiple correlation coefficient—all derive ultimately from Fisher’s 
distribution of the variance-ratio in the normal case. 


The Analysis of Covariance 


24.28. Suppose that we have a one-way classification, possibly with unequal numbers, 
and that in each class the members present values not of a single variate, such as we have 
considered up to now, but pairs of variate-values typified by x,;, y;;, 7 referring as usual 
to class and i to the number within the class. By the ordinary methods of variance-analysis 
we can discuss the effect of classification either on the x-variate or on the y-variate ; but 
there also arises for consideration the effect of class-membership on the covariation of 
x andy. This leads us to an extension of the analysis of variance to that of covariance. 


24.29. By an easy extension of the results for a single variate we have, analogously to 
1 
D>) 5 — 2.) = Di (iy — 2,5) + Di) 1; (eg — 2,,)? 
tJ 4,9 OS 
the equation in product terms 
Ds — 2.) Wis — 9.) = Dey — 29) Ya — 9.) + Diy (ey — 2.) 95 —y..) (24.52) 
i,j tJ j 
If we consider the whole sample as homogeneous the correlation between x and y is given by 
2 (tz — &,.) Yy — Y..) 
= : as = . . (24.53 
VE Gy 2 PZ Wy 9.7} ae 
We have also the correlation between means of classes 
x (#3 — &.) (Yi — Y..) 
= * ** . . (24,54 
VEC; — 2) EU) —V ee 
and may calculate a correlation of residuals within classes 
2s (x4 aes 2 5) (Ys = Y.3) (24 
a » (24.55) 
VJ {& (ig — 0.5)? & (Yig — ¥.3)*} 


ts 


r 


r 


24.30. If there is heterogeneity present we should expect these correlations to differ ; 
and similarly for the three kinds of regression of y on 2, such as 


_ ae (2; <a %.,) (Yi; =, y..) 
oe OS (24.56) 


The three correlations of (24.53)-(24.55) are, however, not additive, like sums of squares ; 
nor are the regressions corresponding. The covariances expressed by (24.52) are additive, 
but there is no simple test, such as exists for variance-ratios, to determine the significance 
of differences or ratios of covariances. Covariance analysis, however, is not primarily 
designed to test independence, but to examine whether there is any variation according 


238 THE ANALYSIS OF VARIANCE 


to class between the regressions of y on x within and between classes. Let us suppose 
that there is some linear relation of the form 


Y — pf, =P eas : : : . (24.57) 

Following the notation of E. S. Pearson, we write 

Cw = ze (245 — © 3)? 

Co = 2 (Ys axe) ee eas: 
Ciy = 2 (X53 — %3) (Yig ~ Y.5) 
Cua = a 
Coq = 2 C95 : : : . (24.59) 
Ci. = 2 C195 
Cum = i (Cae x)? 
Co2m = 2 m3 (Y.5 — ¥..)? ~ 6 + « (2460) 
Ciom = ie a2.) 05 — Y..) 


and C410, C220, Ci2o for the corresponding total sums of squares and products. We may 
then exhibit the composition of the total sums of squares and products in the form of Table 
24.8. The arithmetic of the analysis follows that of ordinary variance-analysis. We 
shall give an example presently. 


TABLE 24.8 


Analysis of Variance and Covariance for One-Way Classification—Sums of Squares and 
Products and Regression Coefficients. 


a 
Viniehiout df. Sum of Squares.| Sum of Squares.| Sum of Products.) Regression 
x-variate. y-variate. LY. Coefficients. 
ee aare Ci25 
Within jth group| nj — 1 C11; C29; C12; by = uj 
j 
eqs Cieza 
Within groups . N—-p Cita C224 O12 ba = 
Cita 
Bet = _ Ci2am 
euween groups pot Ciim C22m Ciom bm = A 
Ciim 
ToTats . N-1 C110 , C220 C120 bp = Ci20 
| C110 


We now suppose that, apart from the regression effects represented by (24.57), the 
variation of a is normal with constant variance v. We can then compile various estimates 
of v from the residual variation after the effect of fitting regression constants has been 


THE ANALYSIS OF COVARIANCE 239 


removed. For instance, within classes we have for the estimator of v, with N — 2p degrees 


of freedom, 
x mag {yig — Y.g — 85 (Rig — %5)} ] 


1 
= WV — 2p » (Cog — 6 C25) 


i 
“v= 
The number of degrees of freedom follows from the fact that we have fitted a mean and 


a regression coefficient to each of p classes, making a reduction of 2p in all. We then obtain 
Table 24.9 :— 


TABLE 24.9 


Analysis of Covariance for One-Way Classification with Linear Regressions. 


Variation due to d.f. Sum of Squares. 


Deviations from linear regressions 
within classes Ho 7 Ere -< N — 2p > {yig — Y.g — 05 (Xig — @.3)}" 


= on (C223 — bj C125) = 8; 


Differences among regressions. . Ds (bs — ba)? (wig — 5)? 


= Dies Gigi ba Chia. rs, 
g 


Deviations within classes from 
linear regression bg . . . . = ) {yy — Yj — ba (wig — 92,7) }* 

%, 4] 
= Co2a — ba Ciza =8,+8, 


Deviations between classes from ; b }2 
linear regression bm . . . . my {yj — Y.. — bm (24 — @..) 

J 
= Coom — bm Ci2m = 83 


Differences between bz and bm . > { (ba — bm) (ig — %.3) 
4] 
+ (6m — ba) (ij w,,)}? 
= (bq — Bin)? Cit C11m 


= Sy 
C110 ’ 


Total deviation from linear regres- | 
sion bo Chee EES od wees » {yj = he = Bo ( Lij — x ig 


t,j 
| = Co20 — by C29 =S) +8, +83 + 


1 
t 


240 THE ANALYSIS OF VARIANCE 


The reader will probably find it useful to check the expressions in the third column of 
Table 24.9 and to examine how the sum of squares of deviations from the regression line 
of the whole is analysed into the constituent items. 


24.31. Suppose now that we wish to test whether the relationship between 2 and y 
can be represented by the formula (24.57), and that there is no material class-effect present. 
Then 8, of Table 24.9 should be an unbiassed estimator of (V — 2p) v and should be inde- 
pendent of the residual estimator S, + S; + S,, which has 2p — 2d.f. We may therefore 
test the hypothesis by the ratio 


81 2p — 2 : 
= NN Op ee . (24.61 
N— 2p S:+ Sey, ie ee _ 


If this variance ratio is insignificant we consider next whether the regressions differ in 
the p classes. For this purpose we compare the estimator derived from S, with that based 
on §,; i.e. the ratio 

p—1- Ss 
will be significant if differences are to be regarded as real. 

If this ratio is not significant, S, and S, may be pooled. Comparison of their sum 


with S, will afford a test whether the relation between group means is linear. The ratio 
for this purpose is 


y,=p—I1, v, = N — 2p : « (24.62) 


S re S; p— 2 
a ; =N—p-—l, =p-—2. 5 : 
——e 5, Vy ?P Y=p—2 (24.63) 
Finally, even if this ratio is not significant, it does not follow that the common regression 
within groups is the same as the regression of the means of groups. To test this point 
we consider the ratio 
S,+S8, 1 


Veped . oe y= N—p—1, Dy = ie . ° (24.64) 


Example 24.7 

A number of recruits are given a preliminary test to ascertain their suitability for a 
certain course of training. At the end of the training course they undergo a proficiency 
test. The marks for three groups of recruits from three different towns are— 


Group 1 {Preliminary : 45, 50, 56, 58, 59, 60, 62, 64, 65, 75 

P "Proficiency: 46, 60, 52, 46, 48, 50, 55, 63, 58, 64 
(Preliminary: 44, 49, 52, 52, 58, 59, 60, 62, 63, 63, 66, 69, 70, 72, 73 
|Proficiency: 48, 55, 45, 60, 65, 64, 69, 71, 77, 70, 75, 80, 72, 75, 81 
Preliminary : 47, 52, 59, 60, 63, 66, 68, 69, 74, 76 
Proficiency: 43, 56, 51, 72, 60, 61, 55, 74, 72, 80. 


Group 2 


Group 34 


We are interested here in the efficiency of the preliminary test as a predictor of the 
proficiency test. We therefore consider the regression of the marks obtained in the latter 
(y) on those obtained in the former (x). We are, however, also very much interested in 
the question whether the regressions are the same, apart from purely sampling effects, 
in the three groups. Such a matter would naturally arise, for instance, if we were thinking 


THE ANALYSIS OF COVARIANCE 241 


of applying the same rejection standards in preliminary tests to all recruits, irrespective of 
their town of origin. 

Our scores are given to the nearest unit, and hence the variates are discontinuous. 
We will neglect this effect and assume that the scores are distributed approximately 
normally. 


About origin « = y = 50 the sums of squares and cross-products are :— 


ees] 
n. » (2x). = (y). (ae) &(y*), | 2 (ay). 
Gog 6 Ae 6 10 94 42 1496 594 694 
Group 2 eRe oo) ee 15 162 257 2802 6101 3989 
Group 3 wo Go oc ¢ 10 134 124 2556 2776 2422 
| 


We can then calculate the quantities C. For instance, 


Ci, = 1496 — a4 = 612-4 


94 
Oi = 694 — 42 = 299-2 


2 


Cia = Cin + Cus + Crus, ete. 
We find the following table in the form of Table 24.8 :— 


TABLE 24.10 


Analysis of Variance and Covariance for Data of Example 24.7—Sums of Squares and Products 
and Regressiois 


| 
oi. af. Sum ican Sum of as Sum oe ee Beguacons 


¥ 
Within first group 9 Cy, = 612:4 | Co, = 417-6 Cioy = 299-2 b, = 0-4886 
es second group 14 Cig = 1052-4 | Coo. = 1697-73 | Cyog = 1213-4 b, = 1-1530 
” third group 9 Cus = 760-4 Coss = 1238-4 C103 = 760-4 bs = 1:0000 
Within groups. . 32 Ci1q = 2425-2 | Coog = 3353-73 | Cizg = 2273-0 ba = 0-9372 
Between groups . 2 Ciim= 83:09 | Co2m= 1005-01 | Cigm= 118-57 bm= 1:4270 


Torars . 34 | Cusp = 2508-29 | Cy9q = 4858-74 | Crop = 2391-57 | by = 0-9535 


J 


A comparison of the three regressions within groups indicates some heterogeneity. 
It looks as if the preliminary test is not such a good predictor for the first group as for 
the others. We may proceed to test the reality of this effect by constructing Table 24.11 
on the lines of Table 24.9. For instance, 


Sy = F (Coa; — Cyo; bj) = (417-6 — 299-2 x 0-4886) + (two similar terms) 
j 


— 1048: 1. 
A..S—VOL. II. B 


242 THE ANALYSIS OF VARIANCE 


We find— TABLE 24.11 
Analysis of Covariance of Data of Example 24.7—Linear Regressions. 


Variation. Clit. Sums 8. Quotient. 
Deviations from regressions bj . . .- 29 | S, = 1048-1 86-1 
Differences bj} «© . + + + © «© «© 2 S, = eee. 87-7 
Deviations from bg . . - + + «+ -» | 31 S, + S, = 1223-5 39-5 
Deviations of groups from by . . . 1 S, = 835-6 835-6 
Difference between ba and bm . . . 1 iy == les} 19:3 
WoW <4 6 9G G o 9 6 4 ] 33 S, +S, +8, + S, = 2078-4 


A comparison of the quotient 36-1 (29 d.f.) with the quotient of the remaining items, 
257-6 (4 d.f.) indicates that there are real differences between classes. A single regression 
equation will not represent all three class-relations. A comparison of the deviations from 
regressions, 36-1 (29 d.f.), with the differences of regressions among themselves, 87-7 
(2 d.f.), does not reject the hypothesis of equality of regressions within groups. We there- 
fore compare the deviations from b,, 39-5 (31 d.f.), with the deviations of groups from b,,, 
835-6 (1 d.f.). This is significant, suggesting that the hypothesis of linearity of regression 
of group-means should be rejected. 

The general result is to confirm our suspicion of heterogeneity. The correlation 
coefficients between x and y are— 


Within first group. , : . 0-592 


», second group . : , ; . 0-908 
» third group . : : : . 0-784 
Within groups . - ; : : . 0:797 
Between groups . : Ae %s A . 0-410 
Total : : : ; : : ~ O22 


Again the deviations between groups stand out as indicating heterogeneity. 


24.32. The analysis of covariance may be extended to the case where there is more 
than one independent variate. The regression coefficients are found in the usual way, 
and the sums of squares after regressions have been removed can be found and compared 
on the usual hypotheses. Suppose, for instance, there are two independent variates and 
a classification giving an analysis between classes and residual. We may represent the 
analysis thus :— 


‘ 
Sum of Squares. Sum of Products. 
Glat 
xt 5 y? © Xs Yr, YXs 
Between classes . . . n A B C 12 
Residual . . . . . ie A’ B’ CO’ Pee y e 
HP OWATS ae) ee n” Ae if (Ch [ze Q” R’ 


THE ANALYSIS OF COVARIANCE 243 


Our regressions are then— 


by by 
Between classes BQ — PR Ak —P Q 
AB — P3 AaB P? 
Residual : i , ' . : EO ee P’R’ INGE = Por 
ALB? = SP A’B’ — PP” 
B’Q’ — P”"R’ A’R’ a P’Q” 
. AB’ =P Ene aoa 


The sums of squares C can then be reduced by eliminating regressions, i.e. by subtracting 
Qb, + Rb,, giving 
ee BQ?—PQR AR*?— PQR 
AB — P? AB — P? 


2 AbO An — 60" — CP? + 2POR 
AB — P? 


. (24.65) 


This and the analogous quantities with primes give independent estimators of the 
variance of the residual element, and a comparison to test eee may be made in 
the usual way. 


24.33. Ina case such as that of Example 24.7 it is evident that a comparison of 
y-means between groups is affected by what we know about the z-values. Ifwe know nothing 
about the latter, comparison of the y’s is a univariate problem and can be treated by the 
methods already discussed, the difference of means, for example, being tested by the use 
of standard errors or the f-test. But suppose that our 2’s themselves are found to be dif- 
ferent between groups and that there is significant correlation between xz and y. Then 
it is possible that the relation, if any, between y’s in different groups is not, so to speak, 
an inherent quality of the variation of y, but is merely a reflection of their dependence on 
the x’s, which happen to exhibit significant differences. In Example 24.7, differences in 
proficiency between groups may be due simply to differences of ability which were present 
before the training began and, if so, should be shown by differences between groups in the 
preliminary scores. We should not then be able to conclude from proficiency scores alone 
that training in one group had a more marked effect than in another. The differences 
were there before the training was applied. 


24.34. If, then, we require to consider the effects of training alone on the groups, 
we may “correct” the y-values by deducting the estimates 


Vu =Y., + 50 (ty — %,,) - , ‘ ; . (24.66) 


or other more general regression equations. This, so to speak, allows for differences due 
to variations of the x-variate. 


244 THE ANALYSIS OF VARIANCE 


Assuming that one linear regression equation adequately describes the relationship 
between y and a, so that the corrected values are 


¥4 — Va =Yy —Y.. — Ooty —2),- 5 aa 
we see that the difference of the corrected means of two classes y,, and y , is 
Y.5 — Ye — 50 (@y — %,x)- . . : . (24.68) 


This may be regarded as the sum of two parts which are independent. The estimated 
es” : ‘ 
variance of the first part, y,; — yz, 1s =i where s? is the mean-square of the residual after 


correcting for regression and the means of y ; and y,, are both based on g members. Simi- 


Zz 
larly the variance of 6 is = where A is the sum of squares of the x-variate entering into 


the residual row of the analysis. Regarding the 2’s as fixed from sample to sample, so 
that our inference is conditional, we see that the variance of the difference (24.68) is given by 


afi + Seth. . ewe (24.69) 


The ratio of the difference to the square root of this expression is distributed as ‘‘ Student’s ” 
t, with degrees of freedom one fewer in number than those of the original residual. 


24.35. Similarly, if we have two independent variables x, and x,, the corrected 
difference of y-means is 
Y.g — Yn — {by (%1y — % yy) + be (Xoy — Hox) } . . (24.70) 
where temporarily we write x,; for the mean of the variate x, in the jth class, and so on. 
The variance of the part in curly brackets may be derived by considering the variance of 
the general expression Ab, + wb,. From the equations for b, and b, we have 
BX (yx,) — PX (yx) 


b= 
mr . (2471) 
ps — PX (yx,) + AX (yz) 
; Anes 


where, as in 24.32, A and B are the sums of squares for 2, z,, and P is the cross-product. 
Thus the coefficient of any y in 1b, + yb, is 


(AB — uP) a, + (uA —4P)e 
AB — P? 
Since the y’s are independent the estimated variance of 1b, + yb, is 
82 
(AB py: (4 0B — uP)? + 2P (QB — uP) (uA — AP) + B(uA — AP)? } 
_ 7B — uP + py? A 
AB — 3 a7. : : : - (24.72) 
Thus for the estimated variance of the corrected difference (24.70) we have 


oe ; 
q AB — P? 

where 4 = x1 — %, and w = xq; — op: As usual, the difference divided by the square 
root of this quantity may be tested in the ¢-distribution. 


»  . (24.73) 


THE ANALYSIS OF COVARIANCE 245 


24.36. Our account of the analysis of variance and covariance has not attempted 
to cover all the applications of the method in particular directions. We have concentrated 
so far as possible on the fundamental ideas and the broad lines of analysis to which they 
lead. Some further developments will be given in later chapters, but we must refer the 
reader who requires a complete acquaintance with the subject to the references given at 
the end of this chapter and the preceding. We will conclude our exposition with three 
final comments. 

(a) Part of our hypothesis throughout has been that the residual element ¢ has constant 
variance from one subclass to another. In Chapter 26 we shall discuss methods of testing 
homogeneity in residual variance. For completeness we might perhaps have anticipated 
some of these tests in the present chapter, at least to the extent of exemplifying their use. 
We have not done so mainly for reasons of economy in space ; but the omission of mention 
of the point in foregoing examples should not lead the reader to overlook (as many writers 
do overlook) the necessity for testing variance-homogeneity where possible, if it is required 
as part of the hypothesis. 

(6) Inthe majority of our examples we have proceeded at once to analyses of variance 
or covariance without dwelling on points which would require attention in any practical 
inquiry. For instance, since the primary function of many variance-analyses is to test 
the homogeneity of a set of class-means, the first stage would be to compute those means 
and examine whether they suggest any lack of homogeneity on intuitive grounds. Again, 
if heterogeneity is established, consideration of the means themselves, or of the primary 
data, will sometimes show how it arises. The student must never lose sight of his primary 
material. 

(c) Elaborating this point to some extent, we would emphasise that the analysis of 
variance, like other statistical techniques, is not a mill which will grind out results auto- 
matically without care or forethought on the part of the operator. It is a rather delicate 
instrument which can be called into play when precision is needed, but requires skill as 
well as enthusiasm to apply to the best advantage. The reader who roves among the 
literature of the subject will sometimes find elaborate analyses applied to data in order to 
prove something which was almost obvious from careful inspection right from the start ; 
or he will find results stated without qualification as “ significant ” without any attempt 
at critical appreciation. This is not the occasion to deliver a homily on the necessity for 
self-discipline in the use of advanced theoretical techniques, but the analysis of variance 
would provide quite a good text for a discourse on that interesting subject. 


NOTES AND REFERENCES 


For the analysis of variance where subclass frequencies are unequal, see Brandt (1933) 
and an important paper by Yates (1934a). Wilks (1938e) has considered the subject from 
the theoretical viewpoint and exhibited the main results determinantally. For the missing 
plot technique see Allan and Wishart (1930) and Yates (1933b). For the analysis of 
covariance see Fisher’s Statistical Methods, Bartlett (1934a), an appendix by E. 8. Pearson 
to a paper by Wilsdon (1934), Brady (1935), Wishart (1936), and Day and Fisher (1937). 
The last-mentioned paper works through a practical example in some detail and will 
repay study. 

See also references to the previous chapter. 


246 THE ANALYSIS OF VARIANCE 


EXERCISES 


24.1. For a two-way classification with one member in each subclass show that, 
for normal variation, 
E (x;, 
and hence that the sums 2’ (x;, —x,.)* and 2 (x, —,,)? are independent. Examine 
k 


a a, .) (© x =a x.) = 0, 


J 
how this breaks down for the non-orthogonal case. 
24.2. Verify the arithmetic of Example 24.6. 


24.3. Generalise formula (24.73) in the following way. If there are m independent 
variates, the variance of corrected differences is 


82 13 > 2,4.) 
r,s=1 


A ; 
where J, = 2; — %,,, and c,, = —" where A,, is the cofactor of a,, in the determinant 


A 
|a,,|, and a,, = 2 a,x, summed over the sample. 
(Wishart, 1936.) 


24.4. Derive by the analysis of variance the test of a regression coefficient given 
in 22.19. 


CHAPTER 25 
THE DESIGN OF SAMPLING INQUIRIES 


Influence of Theory on Sampling Design 


25.1. The reader who is accustomed to handling the results of a sampling investigation 
as they appear in everyday statistical work may have wondered more than once in previous 
chapters whether theory was not reaching out too far in advance of practice. It is true 
that for certain types of experimental inquiry, notably in agricultural and biological research, 
the precision of exact statistical tests does not seem out of place ; but in economic or social 
statistics, for example, there is often so much error and imperfection in the raw data that 
the application of refined methods of analysis would be a waste of time. It is clearly 
useless, and may even be dangerous, to exercise an elaborate mathematical technique on 
data which are suspect from the very start of the inquiry. If our theory is to be really 
serviceable to the statistician and not merely an enticing mental exercise it must be capable 
of solving practical problems. 


25.2. Now it has to be admitted that much of the material with which statisticians 
have to work at the present day cannot be treated by the methods expounded in the fore- 
going pages when sampling questions are concerned. The commonest reason, but by no 
means the only one, is that the sampling process by which the data were obtained was 
biassed. In such cases the statistician has to lay aside the refined implements of his craft 
and do the best he can with his refractory material in the light of his own judgment and 
commonsense. A good deal of current statistical work is of this kind, and there is even 
a section of thought which is inclined to depreciate the advanced theory of the subject as 
“academic ”’ in the sense that it is too remote from practical affairs to be worth studying. 
The misunderstanding is not likely to be removed by the counter-accusation sometimes 
launched by theoreticians that the theory is quite capable of being applied by anyone who 
has the ability to comprehend it. 


25.3. Fortunately there is a growing realisation that the two points of view can 
often be reconciled by collecting the data in such a form that the theory can be applied to 
it. If only enough care is taken at the initial stages of an inquiry there is no need for the 
appearance of imperfect data which defy exact analysis. Knowing beforehand what 
theoretical instruments are at our disposal, and armed with a clear understanding of what 
questions we are trying to answer, we can frequently frame the investigation so as to maxi- 
mise the information acquired with the minimum of effort. In short, the scope and nature 
of our theory itself dictates, to some extent, the form which the sampling inquiry should 
assume. In former times the statistician was usually asked to extract information from 
data which were collected by inexpert agents, frequently for quite different purposes. 
Nowadays he is still in the same position in some respects, but sometimes he is called in to 
advise on the design of the inquiry and can, within limits, determine the form in which the 
data are collected. He can make his theory applicable by selecting his sample in the 


proper way. 


25.4. The general theory of the design of sampling inquiries has not progressed far 
enough for us to be able to give a systematic account of it in this chapter. In some fields, 
247 


248 DESIGN OF SAMPLING INQUIRIES 


particularly that of agricultural experimentation, it has reached quite an advanced degree 
of perfection ; in others there remain many problems unsolved and possibly many more 
which have not yet even been formulated. At the risk of some discontinuity of treatment, 
therefore, we shall only give in this chapter a number of instances in which theoretical con- 
siderations exert a considerable effect on the scope of a sampling inquiry, in order to illus- 
trate the field to be covered. There are, of course, many factors which ultimately deter- 
mine the form of an investigation, such as cost and expenditure of time, but they will 
not concern us here. For the present we shall be concerned solely with the extent to which 
theoretical considerations contribute to all the factors that have to be taken into account 
when an inquiry is designed. 


Some Preliminary Points 


25.5. There are certain preliminary points which, though obvious enough when stated 
explicitly, are often overlooked and cause a good deal of bad design. 

(a) The fundamental object of sampling is to obtain information about a population, 
and it is of the first importance to begin with a clear idea of what that population 
is. Imagine, for instance, that we are asked to ascertain whether pasteurised milk has 
a different feeding value from raw milk. In what population is this inquiry to be made: 
among children ? among the inhabitants of the British Isles ? among those who habitually 
drink milk or those who do not? among townspeople or among country folk? and so 
on. Again, suppose that we are given a new variety of barley and wish to know whether 
it has a heavier yield than a previously known type. Do we mean heavier in the usual 
barley-growing areas ? in every kind of climate or on the average over a series of different 
climatic conditions ? when subject to the same manurial treatments as those in current 
use ? and so on. 

(6) Ina similar way, it is necessary to have an equally clear idea of what we are trying 
to find out about the population. In our example of raw and pasteurised milk, are we 
content to know that there is (or is not) a differential effect for children as a whole ? or do 
we wish to ascertain whether any such effect varies at different ages, between sexes, or 
according to nutritional standards ? What exactly should we like to know? It is no use 
returniny the facile reply “ all about it ” to this query, for our information must be limited 
in virtue of the finite size of our sample. We must make up our minds what information 
we require and which questions have priority if it becomes necessary to sacrifice some of 
them for practical reasons. 

(c) Thirdly, we should consider what we know already about our population. This 
point becomes of particular importance when our prior knowledge indicates heterogeneity, 
for then we may, in effect, have to divide the population into sub-groups and sample separ- 
ately from each. In our milk example, it is to be expected that children of different ages 
may react differently, or that children from lower-class schools may respond differently 
from those in middle-class schools. Or again, in our barley example, the two varieties 
may compare quite differently on Hertfordshire loam and on Lincolnshire chalk. It would 
be misleading to lump all the comparisons together when we have strong reason to suspect 
heterogeneity beforehand. In effect, prior knowledge of this kind frequently dictates the 
types of question we ask under (6), and the two are often different facets of the same problem. 

(d) As an extension of the same point, we may notice that prior knowledge about the 
population sometimes indicates what sort of averages to use and what sort of tests of 
significance it is proper to apply. Crop-yields, for instance, are known to be distributed 
in a form approaching the normal, so that arithmetic means are good estimates of parent 


STRATIFIED SAMPLING “249 


means and the tests based on normal theory may be applied. Accident statistics, on the 
other hand, are often distributed in a modified Poisson form ; income statistics in a J-shaped 
form, and so forth. 

(ce) A specification of the population and a decision as to the precise object of the 
inquiry will usually determine certain parameters which it is required to estimate or certain 
hypotheses for test. In general the problem is one of estimation, but not necessarily so. 
In our case of pasteurised and raw milk, for instance, we should probably wish to know 
the exact amount of the difference between the effects of the two (a matter of estimation), 
not merely whether a difference existed (a matter of significance). We then wish to know, 
before the inquiry begins, whether the estimates we shall have are going to be accurate 
enough fcr our purpose ; or alternatively, if the sample is of a given size, how accurate they 
will be. It may not always be possible to answer such a question completely beforehand, 
since the sampling variances will in general depend on quantities which have to be estimated 
when the data are available, but it is always useful to consider in a general way what sort 
of magnitudes would be shown as significant and what values would leave us still in reason- 
able doubt. As a rule, matters such as this are closely related to sample size. 

(f) Finally, our estimates will be subject to experimental error and, in development 
of the last point, we have to try to find the form of experimental design which, while answer- 
ing our questions, does so with the minimum error. From a slightly different standpoint, 
if we can determine the amount of error which is admissible, the problem is to find the 
design which achieves no more than that error with the minimum expenditure of effort. 
Furthermore, we require to be able to estimate the extent of probable errors. In short, we 
require an efficient design, just as the engineer requires an efficient engine or the aircraft 
designer an efficient form of airscrew, and for exactly the same reasons. 


25.6. Tosum up, our primary task in embarking on a sampling inquiry is to ascertain 
as accurately as possible what is the population under examination, and what is the informa- 
tion about it which we require. If, as usually is the case, that information concerns statis- 
tical characteristics such as means and variances, or more generally frequency-distributions, 
our second task is to design an inquiry which will provide estimates of these unknown 
quantities and will, at the same time, provide estimates of their sampling error. It is not 
always possible, as we shall see later, to obtain full satisfaction in the reduction of error 
and the estimation of error simultaneously. Increased accuracy of estimation may mean 
loss of precision in our estimate of sampling error, so that although we are nearer the truth 
we do not know how near. There does not appear to be any single rule which will cover 
all the cases that can arise. We shall refer to a particular case of some interest in 25.39. 


Stratified Sampling 


25.7. We consider at the outset a case of fairly frequent occurrence in the sampling 
of existent populations. Suppose we are interested in the mean value of a variate x in 
some population J7; and that we know, or suspect, that the population is heterogeneous 
in the sense that we can delimit sub-populations J7,, J7,,. . . I, in which the distributions 
according to « may differ. This type of case might, for example, arise if we were sampling 
the population of a town for income, there being districts, wards or even streets which are 
known to be inhabited by classes living at different income-levels. 

Practical considerations alone may require that we draw a prescribed portion of the 
sample from each sub-population. For instance, with a town of 500,000 inhabitants it 


250 DESIGN OF SAMPLING INQUIRIES 


would be most tedious to sample by using random numbers applied to the whole town. 
We should probably divide the work among districts and blocks and select random samples 
within the blocks. This, however, is not to be confused with the division of the town into 
relatively homogeneous districts because of its heterogeneity. Either process is called 
stratification. The problem we shall discuss is this: If we have decided to draw a total 
sample of n members, and can assign at will the number », drawn from the ith stratum 
IT,, subject to the condition 2 (n,) = , how should we choose the numbers 7,, or need we 
choose them at all? Will our estimate of the mean value of x be better if we merely choose 
nm members at random from J/, or can we improve it by controlling the numbers n, and not 
merely leaving them to chance ? 


25.8. Let x,; be the jth member of the sample from the ith sub-population, and let 
the latter contain a number NV, of members with mean yw; and variance o?. If u is the 
mean of JI we shall have 


k 
it 
12 pa . ° ° * ° (25.1) 


We shall now seek for parameters A,,; such that our estimator of yw, say t, is given by 


ie eG 


pa) Dey) = ee 


t=1 j=1 


that is to say, is a linear estimator in the observed variate-values. We shall seek for that 
estimator which is unbiassed and has minimum variance, i.e. for which 


iG) =n : : : 5 « (2553) 
E {ft — # (é) }* = minimum. : : »  « (25.4) 
Substituting from (25.2) and (25.1) in (25.3), we find 


1 
{Sivas} = E¥m 
i, J 


and since H (x;;) = mu, this gives 


Nove 
Em(Eay— Fi) =o. cs aoe 


For this to be generally true we must have 


ny N, 
> hij = WN’ ° . ° . ° e (25.6) 
j=1 


a first condition on the d’s. If 4,, is the mean of /,, in the ith set we have 


jae 
te Ny . ° ° ° ‘ ° (25.7) 


Now consider (25.4). The variance of ¢ is the sum of k& variances, for the samples from 


sub-populations are independent. Consider then the variance of X Aiz %jy, Temembering 


STRATIFIED SAMPLING 251 


that the population of NV; members is finite. We have 
variance = E 2’ {A,, (x,; — p,;) }? 


= fae oj + SS {Bl dag dite (ig — Hs) (Cie — Hi) }s gk 
gi, 
Py ec aes G; 
= = dij OF >, hig hit Neca 


oN, Si of aes 
Nees Rew 


0: 
= Nee 1 {n, (Ni, — ™) Ai, + NX (Ay — A,.)?}. e . - (25.8) 
This is clearly minimised only if 
Ay —4;,=9, . . ° ° 5 . (25.9) 
that is, if all the 4’s for any sub-population are equal. This is what we should expect on 
intuitive grounds, for there is no reason for weighting the sample members differently in 
the same sub-sample. 
Our minimal variance, say v, is then given from (25.8), by summing over 7, as 


oj (N; — n) 


v= 2 - 0, Ae 
A N,; eee 4 
= — yi (Ns = i) Ni 
2 Min 
1 oi N3 


— Wit Wy, —1)n, 


This is a minimum for variations in n, subject to Yn, =n if 


+ constant. . : « a Bee. 10) 


) 
=—_—— — a3 * = 0, 
where p is an undetermined constant. This yields almost at once 
2 3 
2 a3 Ni 
° . ») (zou 
ee NGI Co 


25.9. If we know the population variances o? and the numbers N, this equation 
determines the numbers n;; but in practice it is rather unlikely that we should know the 
variances without knowing the means, in which case we should not have to sample to find 
the mean of the whole population. Our result is not, however, useless. In the first place 


we find for the estimator ¢ 
eee Ne he 
ij Ari : 5 N N; 
N; 


=o (25.12) 


so that the estimate is a weighted average of the sample means, the weights being propor- 
tional to the population numbers NV, not to the numbers n;. Secondly, without knowing 
the variances o% exactly, we may sumetimes reach approximations from prior knowledge 
of the populations. Such values, without giving absolute accuracy, will at least represent 
improvements on selecting the n’s by ‘chance. 


252 DESIGN OF SAMPLING INQUIRIES 


25.10. If the numbers JN, are effectively infinite the formulae simplify, and, for 
-instance, instead of (25.11) we have 
n,co,Ni, . . . : : . (25.13) 


the sample number varying with the standard deviation in the stratum concerned, as well 
as its number of members. 


25.11. If there is no information available at all about the variances o? the most 
reasonable course in applying (25.11) appears to be to suppose them all equal. In such 
a case, for large NV; we have 

noN, . ; : : . (25.14) 


or the sampling numbers are proportional to the population numbers. This is what we 
might expect on intuitive grounds. If the populations are infinite the n,’s are equal, which 
again is in accordance with intuitive ideas. 


25.12. The above will serve as an illustration of the way in which theoretical require- 
ments can influence the scope of an inquiry conducted among an existent population. By 
seeking for an estimator with minimum variance we have been led to expressions deter- 
mining the allocation of sample numbers among the different strata—and incidentally, of 
course, we have derived expressions for the minimum variance, so that the maximum 
possible precision can be ascertained. The fact that some of our results depend on unknown 
constants suggests that in some circumstances it may be worth while conducting a pre- 
liminary or “‘ pilot ”’ inquiry in order to estimate the unknowns and hence to improve the 
precision of the main inquiry which is to follow. The possibilities of such pilot surveys 
have yet to be explored, but the technique appears to merit serious investigation. 


25.13. In passing, we may mention one other topic of great practical importance on 
which theory can throw a good deal of light, that of optimum size of a sampling unit. In 
sampling a human population of a town, for instance, need we take individuals as our 
units ? It would be easier to sample households, or streets, or even whole districts ; but 
do we lose anything by this method, and if so, how much? Furthermore, the grouping of 
individuals into units of larger size sometimes has a peculiar effect on correlations which 
may lead to erroneous conclusions, and a theoretical investigation may be required to safe- 
guard against error. We shall not pursue the subject further here—the sampling problem 
would require a book in itself—but the reader who is interested may like to consult some 
of the papers referred to at the end of the chapter. 


The Design of Expervments 


25.14. For an existent population the flexibility of sampling technique is somewhat 
limited. We are given an aggregate of values, some of which are to be extracted for scrutiny, 
and no manipulation of the sampling can tell us more than exists, so to speak, already 
inscribed upon the population itself. Consequently the main line of endeavour in such 
cases lies in estimating with the greatest accuracy (which is largely a matter of choosing 
the right statistics and minimising sampling variability), or in ensuring that sufficient 
material is available to enable the requisite comparisons to be made with significance 
(which is largely a matter of sample size and selecting the most suitable tests of significance). 
Nothing can alter the population, and theory will, as a rule, only react upon the sampling 
process by some such method as has already been exemplified, e.g. in dictating that the 


THE DESIGN OF EXPERIMENTS 253 


sampling must be random, in stratifying the population before the sampling is carried out, 
and in deciding how limited resources can be expended to the best advantage. 


25.15. For hypothetical populations there are often wider possibilities, for the nature 
of the inquiry may itself determine which populations are to be studied, and the populations 
may, to a certain extent, be set up at will. For instance, if we are interested in an inquiry 
into the relationship between income and size of family in the United Kingdom, the popula- 
tion already exists and we cannot go outside it; whereas if we wish to discuss the effect 
of a poison on bacterial growth or of a fertiliser on the yield of barley we can not only 
reproduce experimental data ad libitum but can arrange the inquiry so as to confine it to 
certain populations (e.g., by considering only a given type of bacterium in fixed nutritional 
circumstances or at fixed temperatures), or we may extend the domain of consideration as 
far as purely practical limitations will allow (e.g., by growing barley in new surroundings 
or in new climates). This is rather a pretentious way of saying that we may experiment 
in a domain which, within limits, can be assigned at will. The statistician has a much 
greater scope for ingenuity in the design of experiments than in the design of sampling 
inquiries on existent populations because of the greater degree of control over the population 
under examination. 


25.16. In the classical ideal experiment, only the factors under consideration were 
allowed to vary, other conditions being kept as constant as laboratory practice would allow 
—in investigations concerning the relation between resistance and current in an electric 
circuit, for instance, attempts would be made to keep factors such as temperature and 
external magnetic effects strictly constant. It would be recognized that there would be 
residual errors which would affect the exactitude of the results, but these would be measur- 
able on certain assumptions. 


25.17. Statistical theory can, of course, deal with such cases, but it can also go farther 
and often wishes to do so. In the first place, it frankly admits the existence not only of 
experimental error (in the sense of aberration from a “ true’ value) but of the much wider 
type of variation which gives rise to frequency-distributions in practice. Instead of isolating 
particular factors for study, it may wish to give full play to the disturbances which arise 
in practice in order to investigate what happens in “ natural ”’ conditions. For this reason, 
statistical experiments are often complex in the sense that a number of factors are allowed 
to vary simultaneously. 

Secondly, the admission of outside influences which together make up what is generally 
called experimental error implies that it should be possible to estimate the extent of such 
error from the data themselves. We wish to obtain, not the functional relations between 
variables which may only exist under artificial conditions, but the stochastic relations 


observed in practice. 


25.18. The effect of this on experimental design is that the hypothetical population 
we consider is often a rather general one. Taking the case of trials of a new variety of 
barley as an example, we should wish to compare its yields with those of other varieties 
in different soil conditions, with different manurial treatments, in different years (so as to 
get variations in climate), and so on. Furthermore, to obtain estimates of the error due 
to other factors we usually have to replicate the experiment. A great number of inter- 
comparisons fall to be made, and the process of design is essentially that of finding a form 


254 DESIGN OF SAMPLING INQUIRIES 


of experiment which will permit all these comparisons and yet save as much unnecessary 
labour as possible. 


Orthogonality 


25.19. To reduce the discussion to more concrete terms we will consider the testing 
of a new variety of barley. In order to study its behaviour under different soil conditions 
we will select a number of areas in which barley is grown and choose a block of ground in 
each. This will give us inter-soil comparisons. We will also arrange to carry the experi- 
ment on for a period of years, so that climatic variations may also be compared. The 
other factor in which we are interested is the response to certain manures, which we will 
take to be dung (D), potash (K), nitrogen (VV), and phosphates (P). 

Consider any block at any one place in any year. We will decide on certain standard 
quantities of the four manures and assume that for any manure either a dressing of this 
standard amount is to be given, or it is to be withheld. This simplifies the experiment, 
for then every manure either is or is not applied, and our results can be classified by simple 
dichotomies. Of course more complicated experiments can be devised to allow for different 
quantities of fertiliser, but the simpler case will be sufficient for our purposes. 

We have then set up a population which can be classified according to six qualities, 
place, time, and the application of four manures. Our results are intended to show whether 
there is any variation in yield between these conditions and various combinations of them. 
Of course, it does not follow in deductive logic that if there is significant variation from year 
to year in the particular years chosen there will always be temporal or climatic variation ; 
and similarly, if there is significant variation from place to place it does not follow that 
other soil conditions which have not been tested will show a significant variation. To 
arrive at such conclusions we have to perform an ordinary generalisation by induction. 
What we shall say, if significant results appear, is that in the regions tested, or for the years 
tested, there were significant variations, and that it therefore appears likely that soil and 
climate exert a material effect on yield—and we shall maintain this with more or less con- 
fidence according as our experience is wider or narrower. This is the familiar inductive 
inference which forms the basis of all scientific inquiry. 


25.20. Within any one block we shall wish to study the effect of manurial treatments 
not only separately but in combination. We therefore divide the block into sixteen com- 
partments and treat them, respectively, with no manure, D, K, N, P, DK, DN, DP, KN, 
KP,NP, KNP, DNP, DKP,DKN and DKNP. Here every possible combination appears 
once and only once. ‘To compare, for instance, the mean yields in the presence or absence 
of dung we add all the eight yields for plots on which no dung was spread and compare it 
with the sum of the other eight. All the necessary comparisons can be made. 

Data of this kind are said to be orthogonal. Each possibility arises an equal number of 
times. ‘The reason for the use of the word is that such material is orthogonal in the sense 
we have considered in the analysis of variance. We saw in Chapters 23 and 24 that where 
cell-frequencies were equal the analysis was greatly simplified, and that under the custom- 
ary hypotheses the estimates of means were independent. It is not, of course, absolutely 
necessary to have orthogonal data—in fact, we have shown in Chapter 24 how to deal with 
the non-orthogonal case; but it is evidently a great convenience to be able to arrange 
for orthogonality, and no efficiency is lost by doing so. 


RANDOMISATION . 255 


Replication 


25.21. If, as suggested above, we divide each block into 16 plots and treat each differ- 
ently, the analysis of variance of any block will have 15 degrees of freedom ; and if we 
cannot ignore any of the interactions there will be no residual variance due to “ error ”’, 
that is to say we cannot estimate the reliability of our comparisons. All the 15 possible 
independent comparisons may be made, but we cannot decide whether differences are 
significant in the sense that they may be due to the other factors which we have agreed 
to allow to bear on the experiment, such as individual soil differences from plot to plot. 
If we are to estimate such “error”? we must give the factors which produce it an oppor- 
tunity of varying. This may be done by replicating the experiment, that is to say, by 
repeating it in the same form. For instance, suppose that we set up four blocks and divide 
each into 16 plots, applying our manurial treatments to each block. Then, assuming that 
there are no significant interactions between blocks and treatments (a matter which we 
can test by examining the interaction terms in the variance-analysis), we shall have 63 
degrees of freedom, of which 15 are assignable to treatments and their interactions and the 
remaining 48 to a “ residual ” term, the latter providing an estimate of experimental error. 
We have exemplified this process in Chapter 23. 


Randomisation 


25.22. Up to this point we have said nothing about the arrangement of our 16 plots 
within the block. Suppose we divide our block into plots of equal size. Is there any 
advantage in allocating the treatments systematically, or is it preferable to assign them 
at random ? 

We shall consider the relative merits of random and systematic arrangements in more 
detail below, but we can announce the general rule now: unless there is some good reason 
to the contrary, it is better to allot the treatments at random. Where possible, chance 
should be given full play. 


25.23. The justification for this rule in our present instance can be seen by reference 
to the section on randomised blocks in 23.41. We saw there that by randomising the 
allocation of plots we were able to preserve the z-distribution and hence to validate our 
tests of significance, even where normality in the parent form was not assumed. The 
process is essentially one of extending our hypothetical population. Instead of considering 
the observed yields as specimens of what might happen in repeated trials of the same variety 
of barley if the same manurial treatments were applied to the same plots, we consider the 
possible yields in repeated trials if the manurial treatments were applied in all possible 
ways to different plots. Our experiment is systematic in the sense that we prescribe a 
different treatment for each plot ; it is random to the extent that we allot the treatments 
to plots by chance. 


25.24. There is one source of possible confusion here which it is desirable to remove. 
In our agricultural example complications arise because of the physical contiguity of the 
plots, and we shall see below that it is often desirable to eliminate by special designs system- 
atic fertility gradients in the soil. In other classes of experiment where we desire orthogon- 
ality, the members need not be subject to this kind of effect, and often are not. Reverting 
to the example of raw versus pasteurised milk which has already been mentioned, suppose 
we take a simplified case and wish to measure whether the two different milks have different 


256 DESIGN OF SAMPLING INQUIRIES 


effects on boys and girls. With a class of 40 children, 20 boys and 20 girls, we can proceed 
in several ways. It is obviously useless to give raw milk to all the boys and pasteurised 
milk to all the girls, for then we have no measure of the differential effect, if any, for either 
sex alone. We might toss up in each case and allot raw or pasteurised milk to each child 
by chance ; but this would probably make the data non-orthogonal. To attain orthogon- 
ality, we should allot 10 children to each of the four sub-groups BP, GP, BR, GR (where 
B = boy, G = girl, P = pasteurised, R = raw). We then have an analysis of variance— 


Degrees of freedom 


Between sexes . ‘ c ; . ‘ é F : 1 
Between milks . : : : : A : : , il 
Residual (including interactions) . : : : ; < -10u 

TOTAL. : : : : ; : . 39 


This is analogous to a test of a cereal with two fertilisers and 10 replications. 

The question is, how should we allot the children to the four groups? Their sex, of 
course, is determined, but the nature of the milk they receive is at choice. It is here that 
the randomisation will help. The ten children of a specified sex who receive raw milk 
should be chosen at random from the 20 available. In this instance it might be thought 
that any method would do; but it is best to avoid the risk of bias. If the children were 
chosen by the teacher he might tend to select the 10 bigger boys or the 10 brighter boys. 
If they were chosen alphabetically, we might get brothers and sisters automatically receiv- 
ing the same treatment; and so on. The randomisation process avoids all systematic 
effects of this kind and brings us a stage nearer to obtaining an unbiassed answer to our 
questions. 


Sensitivity of a Test 


25.25. In some cases, where the variate is discontinuous, the nature of the test of 
significance which we propose to apply may make a difference to the form of the experiment. 
If we are testing a certain hypothesis which can produce a specified number m of experi- 
mental results which are acceptable as conforming to the hypothesis, whereas other 
hypotheses produce a number n of other results, we clearly want to keep m as small as 
possible compared with n. The ideal case, of course, is that of the “ crucial” experiment 
in which the hypothesis can only give one result and other hypotheses give a different 
result. The result then proves or disproves the truth of the hypothesis, and no test of 
significance arises. In statistical practice we do not as a general rule perform crucial 
experiments, but we can sometimes design an experiment so that it is more crucial, if the 
expression be allowed, than alternative methods. 


25.26. Consider, for instance, the case of a cashier who claims to be able to detect 
good money from false at a glance. To test this ability we spread ten coins before him, 
tell him that p are good, and ask him to point them out. What number of good coins p 
should we include among the ten ? 

If the cashier had no power of discrimination and there are p good coins, the proba- 
bility that he would guess right by chance is 


We} 


LATIN SQUARES 257 


for the total number of ways of selecting p from 10 is the denominator of this fraction and 
only one of them is right. Now we want to choose p so as to minimise the probability of 


: i. 10 ea: 
such an event, i.e. so as to maximise . This is clearly done when p = 5, so that we 


Pp 
ought to have five good and five bad coins in the set. Any other number would increase 
the probability that he might be right by chance and hence decrease the sensitivity of the 
experiment. 


Latin Squares 


25.27. We now proceed to consider a different type of design, which has been freely 
applied in agriculture but may also be applied to other forms of inquiry. Suppose we 
have a variety of barley to test and five different treatments to apply. We will assume 
that replication has been considered necessary and will replicate five times, the same number 
as the treatments. We will then divide our block into 25 plots like a chessboard (though 
the plots may be rectangular and need not be exact squares, provided they are all the same 
size). Each row may be considered a replication of the five treatments, and this itself 
involves the appearance of each treatment once and only once in each row. Can we extend 
the arrangement and ensure that in addition the treatments will occur just once in each 
column ? 

The answer is affirmative, as the following example shows :— 


ABC DE 
BCA ED 
CEDAB se (Caer) 
D A E aC 
ED B CumA 


An arrangement of this kind is called a ‘‘ Latin square”. It was studied extensively by 
Euler in the eighteenth century, though not of course from the statistical viewpoint. 


25.28. The advantage of this arrangement lies in the fact that it eliminates possible 
correlational effects due to fertility gradients in the soil or accidental circumstances which 
may exercise a “ patchy ”’ influence on the whole block. If we could be sure that there 
were no such influences at work, and that the soil was entirely homogeneous in the block, 
it would not matter where the treatments were placed; but by imposing the restriction 
that no treatment appears more than once in the same row or column we remove at least 
horizontal and vertical gradients from our comparisons. Suppose in fact that there were 
gradients running across the block and down it. When we work out the mean yield of the 
treatment A we shall add together five values, one of each in the various rows and columns. 
Similarly for B, so that a comparison of A and B is not affected by the systematic influences, 
which work equally on both. 

It is not, of course, true that the Latin square arrangement eliminates every effect due 
to soil heterogeneity. There might be systematic effects running diagonally which might 
still remain. It is, however, clear that in removing the effects in two perpendicular direc- 
tions we have substantially improved the comparison of mean yields as compared with 
a systematic arrangement. 


A.S.—VOL. I. = 8 


258 DESIGN OF SAMPLING INQUIRIES 


25.29. The analysis of variance of a p x p Latin square may be carried out in the 
following form :— 


Sum of squares d.f. 
Between rows . - ‘ 2 p—l1 
Between columns . : . : p—1 
Between treatments . : : p—l 
Residual ; : : - (p—1)(p — 2) 
ToTaL ; s i : pr-l. , . (25.16) 


and the four constituent sums are, on the hypothesis of homogeneity, distributed as vy? 
independently. Before proving this result we will consider an example. 


Example 25.1 (from Thomson, Brit. J. Educ. Psych., 1941, 11, 135; data by 8S. D. Nisbet). 


A set of children were divided into four equal groups and each group was given four 
lists of words to test spelling ability. Each list formed one of four different types of test 
which we denote by A, B, C, D. The arrangement of the experiment is shown in the 
following table, together with the total scores of the corresponding groups :— 


Groups of children 


1 2 | 3 4 TOTALS 
A B Cc D 
1 81 41 44 53 219 
D A B C 
2 38 97 42 49 | 226 
C D A B 
bean 3 31 43 67 36 | am 
B C D A 
4 57 a3 43 81 214 
ToTALs 207 214 196 219 | 836 | 


For instance, the first group of children had the first list of test A, the second of test 
D,and so on. No group had the same lists as another group, and each list was used exactly 
once. The scores (corresponding to yields in the agricultural case) were in fact the number 
of words spelled wrongly in a prior test but correctly in this test. 

The above table, of course, does not represent anything corresponding to the physical 
layout of an agricultural experiment, but it shows how a similar object can be secured to 
the avoidance of contiguous effects. Since it is possible that some relationship may exist 
between the lists of words and the tests (e.g. by accident one list might be particularly 
unsuitable for a test), we wish to ensure that not only will each group of children have 
each of the four tests, but that no list shall be given more than once and every list at least 
once. This is precisely what the Latin square accomplishes. The fact that the diagonal 
arrangement of the letters is systematic does not affect the present inquiry, though in an 


LATIN SQUARES 259 


agricultural experiment a systematic diagonal fertility gradient might affect comparisons 
between treatments. 
An analysis of variance on the usual lines gives the following results :— 


Sum of Squares. d.f. Quotient. 

iets (riws) 495. 0 eee 359-5 3 119-83 

Groups (columns) . . . . 74:5 3 24°83 

Tests (treatments) . . . . 4626-5 3 1542-17 

Egudmal .  . «hl |. 606-5 6 101-08 
PRODAT Sie. tie ccy Bins 5667-0 15 


The differences between lists are evidently not-significant, from which we should conclude 
that they appear to be on a par so far as these tests are concerned. The quotient due to 
groups indicates that the children are more alike than chance would lead us to expect, but 
not significantly so, for the variance ratio 101-08/24:83 = 4-1, 1, = 6, v, = 8, is not signifi- 
cant. On the other hand, the quotient due to tests is very significant, the ratio 
1542-17/101-08 = 15-3, », = 3, », = 6 being beyond the 1-per-cent. point. We conclude 
that there do exist differences between the tests. 


Construction of Latin Squares 

25.30. The numbers of possible Latin squares of order p is very large for high values 
of ». There are, for example, 576 squares of order 4 ; 161,280 squares of order 5 ; 373,248,000 
of order 6 and 61,428,210,278,400 of order 7. Up to this order they have been enumerated. 
Although many examples of squares of higher orders are known, the problem of enumeration 
for p >8 awaits solution. Details and examples will be found in Fisher and Yates’ 
Statistical Tables. 

By interchanging rows and columns the square can always be brought to a form in 
which the top row and left-hand column are in the order ABC, etc. It is then said to be 


a “‘ standard square’. For instance, there are four standard squares of the fourth order :— 
a BCD Amit ep ABCD AD 6D) 
BPA SC 15 a ORD Tey. | B DeAro, BADCGC 25.17 
C DBA ep ce 5“ orned™ ny 21) 
DiGyA UB eA BC DC BA DC BA 


From each of these, 144 (= 4! 3!) squares may be derived by permuting all columns and 
all rows except the first. (There is no point in permuting the first row, because the result 
would be a repetition of squares already obtained with an interchange of the letters 
A .. D, not an essentially different layout.) The total number of squares, as stated 
above, is therefore 4 x 144 = 576. 

It is only necessary to specify the standard squares. To select a Latin square at 
random we choose a standard form at random and then permute rows and columns at 
random, the randomising process being most conveniently carried out by Sampling 
Numbers. For squares of order 8 or more, where the standard types have not been enumer- 
ated, we can only choose one of those which has, and hence select one at random from a 


restricted set of all possible squares. 


260 DESIGN OF SAMPLING INQUIRIES 


Analysis of Variance for Latin Squares 
25.31. We must now justify our assertion that the Latin square may be analysed 
in the form (25.16), and that the z-test applies to the variance ratios which arise in the 


analysis. 
For an ordinary two-way classification we have 


Z (ayy, — &,,)? = 2 (aj, —@, P+ 2 (wy, — 2%)? + 2 (ty, — 2, — Uy + %,,)?. 
Thus, if «, is the mean of rows and x, that of columns in the Latin square, we have, writing 
@ fora, 

Z (yp ~— &)? = 2 (2, — £)* + 2 (u, — £)7 + 2 (z,, — 4%, —2, + 2)* «. (20-18) 
and the three parts on the right are distributed independently as vy? with p — 1, p — 1 and 
(p — 1) (p — 1) degrees of freedom respectively. 

Now 
Pe yy a) e = 2' (x, — x)? + 2' (a, — %, — %_ — % + 2%)* 

+ 22’ (x, — %) (Lye — U, —@, —% + 2%) . - (25.19) 
where x, is the mean of treatments. 

Consider the cross-product term in (25.19). The summation takes place over all p? 
values in the Latin square. Let us confine our attention to the summation for some par- 
ticular treatment. For this summation the factor x, —% is constant. Summation for 
the other factor gives 

2 (Lng — Lp — Ly — X, + 2%) = px, — Xa, — XX, — px, + 2x » (25.20) 
and since one treatment occurs in each row and column, 
ot. = DE 
as se re rr 
and hence the sum (25.20) vanishes. 

Thus the cross-product in (25.19) vanishes also and we have 
2X (Lng — £)? = XL (x, — £)? 4 2B (a, — #)? + YX (a, — Z)? 

+ 2 (Ly — Lp — L, — 1, + 2H)?. , « (25.22) 
This gives us the analysis of the sums of-squares, and it only remains to show that the third 
term on the right in (25.22) is independent of the fourth. It will then follow that the four 
terms are distributed independently with p — 1, p — 1, p — 1 and (p — 1) (p — 2) degrees 
of freedom. 

' The required property of independence can be established directly, but it also follows 
from considerations of symmetry in the Latin square which have an interest of their own. 
We have regarded the square as composed of rows and columns, with treatments allotted 
in a certain way ; but by rearrangement we can equally well regard it as composed of rows 
and treatments with columns allocated in a certain way. For instance, if we take the 
first standard square in (25.17) we may write it :— 

Treatment : 
AY EB Ce. 
Rows: 1 C, OC, O; C, 
2 Ce C; CO; Cs 
3 GC, Cy Cy, C, 
4 C; Gr Cy (OF 


where, for instance, treatment A occurs in row 1, column 1 (C4), row 2, column 2 (C.), and 


ANALYSIS OF VARIANCE FOR LATIN SQUARES 261 


soon. This, of course, is not a physical layout, but that is immaterial for present purposes. 
It follows that since the sum of squares between columns is independent of the residual in 
(25.22), so also is that between treatments. 

The variance analysis then, takes the form 


Sum of Squares. d.f. 
OC oe Cog Se Z (xp — £)% pA 
Columns... . = (%_ — #)2 a1 
Treatments . . . ree — £)? Coat » (25.23) 
Residual . . .. Z (2re — Lp — Xe — wy + BE)* Epes) 
TOTALS ae 2 (Lpe — £)* pi—]1 


25.32. The above form provides a homogeneity test of the usual kind. If the test 
proves significant of heterogeneity we may, in the usual way, consider the hypothesis that 


Lie =A +b+6,+le - ‘ : ; . (25.24) 
where ¢,, is normally distributed about zero mean. We leave it to the reader to show, as 
in Chapter 23, that in such an event the residual mean square is an unbiassed estimate of 
the variance of £ with (p — 1) (p — 2) degrees of freedom. 


25.33. As in the case of randomised blocks, it appears that under certain general 
conditions the z-distribution is reproduced approximately for fixed values which are per- 
muted in all the permissible ways consistent with the Latin square design. We omit an 
investigation into this result (for which see Welch, 1937) as the algebra is considerably 
more complicated than for randomised blocks. The result has been confirmed by a limited 
number of experiments. 


Graeco-Latin and Orthogonal Squares. 
25.34. If the two squares 


; : . (25.25) 


Qump 
Sh by 
moa 
RwWAL 
Qmwy 


baka 


are superposed we have the arrangement— 


BAS BE CC DD 
BC AD DA CB 
CD DE. AB ISEA. . 
DB CAy BD VAG 


. (25.26) 


in which every possible pair of letters (X Y being regarded as different from YX) appears 
just once. Such a pair of squares is said to be orthogonal. The form (25.26) is sometimes 
written with Greek letters instead of the second Roman set; hence the name of Graeco- 
Latin square. It is also possible to superpose a third factor which we will denote by the 


262 DESIGN OF SAMPLING INQUIRIES 


numerals 1-4 in such a way that each combination of any pair of types occurs just 
once, e.g. 
Ag«l BB2 Cy3 Db4 
By4 Aéd3 Da2 CBl1 
C$2 Dyl A484 BaS “ee eee 
DB3 Cad4 Bot vay? 


Complete sets of orthogonal squares (i.e. those in which there are p — 1 factors for ap x p 
square) are known for all prime p and for p = 4, 8 and 9. Curiously, there is no set for 
p =6. Up to and including p = 7 they have been enumerated. 

We shall not enter here into the use of these squares in experimental design. They 
are generalisations of the Latin square in which, by suitable arrangements, several factors 
can be tried out simultaneously, so that all possible combinations of pairs occur an equal 
number of times. 


Confounding 


25.35. It will be evident that if we wish to consider in full a classification according 
to several variates, particularly with replications, the number of individual members in 
the sample may be very large. For instance, if we wish to test a variety of barley with 
three different applications of four types of fertiliser, there must be 81 yields even without 
replication, if we want to make all the comparisons possible. Physical considerations may 
make a layout of an experiment on such a scale impossible. The difficulty is possibly more 
serious in experiments on expensive animals such as cows. 

Where economy in the size of sample is a very material factor we may be able to reduce 
the sample at the expense of sacrificing some of the less important comparisons. For 
example, to consider once again the case of barley and the effect of fertilisers: we shall 
undoubtedly wish to compare yields of D and not-D, K and not-K, P and not-P, N and 
not-V. We may also wish to compare first-order interactions of the type DK and not-D, K. 
But it is quite possible that interactions of higher order, such as the effect of dung in the 
presence of two other fertilisers, are negligible. Where we are prepared to assume that this 
is so, on the basis of prior evidence or otherwise, we can dispense with certain information 
and still make the comparisons we wish while retaining properties of orthogonality. 


25.36. Consider, as an illustration, an experiment with three fertilisers, each of which 
is applied or not applied, say N, P and 4K, and four replications. In the ordinary way 
there would be 32 plots and we should have an analysis of variance as follows, assuming 
that block-treatment interactions may be regarded as part of the residual :— 

Sum of squares. 


Blocks 
N 


ee 
= 


a a oo el el oe) 


NP 

NK 
Pie 
NPK 
Residual 


Toran . ; 


as 


CONFOUNDING 263 


Now suppose that we divide our main blocks into two sub-blo¢ks, the first containing 
the treatments 
O(None) NP, NK, PK, . : : : peas) 
and the second the treatments 
Neer Ne. : : : : . (25.29) 
We may then analyse the variance as follows, regarding the sub-blocks as blocks of four 
plots each :— 


= 
rs 


Sum of squares 
Blocks ‘ : ; : 
N 
Jes 
K 
NP 
NK 
PEK . 
Residual 


. 
OO ee et 


— 


. 
w 
—_ 


TOTAL f : 4 é 


In fact, if we wish to compare the yields with N and those without N, i.e. 


N+NPK +NP +NK 
with CoPPK 2P +k. 


it will be seen that we add two members from (25.28) and two from (25.29), so the difference 
is not affected by block differences; and similarly for the other comparisons. Such a 
design is said to be balanced, and the interaction VK P is confounded with block-differences, 
since in the eight blocks it cannot now be isolated from block effects. The advantage of 
the second design over the first is that, without losing anything appreciable in comparisons 
between treatments, we have gained a good deal in the assessment of block effects ; for the 
residual has only declined from 21 to 18 d.f. whereas the sum of squares between blocks 
has increased from 3 to 7 df. 


25.37. The ideas of orthogonality, randomisation, balance and confounding have 
been developed to an advanced degree and with great ingenuity, particularly by Fisher 
and Yates. The slight sketch we have given of the methods in this chapter is intended to 
be no more than illustrative of the way in which the theory of experimental design is capable 
of development, at least in certain fields, and the manner in which efficiency may be imported 
into a practical inquiry by a due regard to theoretical requirements of the design. For a 
comprehensive account of this branch of the subject the reader should consult Fisher’s 
Statistical Methods and Design of Experiments, Yates (1937b), and a useful introductory 
account by Goulden (1939). At this point we leave these particular topics and return to 


certain general matters. 


Design and Randomisation 

25.38. Whenever an inference is to be made, and particularly where hypothetical 
populations are concerned, the reader will find it useful to ask himself what precisely is the 
population under consideration. We can illustrate the point very usefully by discussing 


264 DESIGN OF SAMPLING INQUIRIES 


a subject on which there has recently been difference of authoritative opinion—that of 
occasional conflict between the requirements of balancing and randomisation. 


25.39. Consider in the first place the testing of a cereal under two treatments, denoted 
by A and B; and to simplify matters as much as possible, suppose we are to sow eight 
plots in a straight line. In what order shall we allot the treatments ? 

If the plots are not too large so that the row covers a big area, it is quite possible that 
there may be a trend of fertility in the soil itself which will affect yields differentially and 
hence interfere with comparisons which we might make. Suppose that we do wish to 
guard against a fertility gradient so far as possible. We might then decide on one of the 
“balanced ” arrangements : 


AABBBBAA . . . .  . (25.30) 
ABBAABBA . . . . . \@hemy 
ABABBABA . . . . . (25.32) 


As will be easily seen, if there is a linear gradient in fertility along the row the means of 
A and B treatments respectively will be affected to the same extent and hence their differ- 
ence unaffected. For instance, consider (25.30) and suppose the linear gradient is repre- 
sented by an additive factor g+kp,k =1... 8. On the hypothesis that the remain- 
ing effect consists of a constant a for A-treatments with a normal residual £, and similarly 
for B, the yields are 


A-treatments: q+ pta+&,q+2p+a+& qitptiat+&,qi8&ta+é 
B-treatments: q+3p+6+4&, ¢qt4pt+b0+&q+5p+6+65,qi6p+d0+6 
with means 

4 (4g + 189) +a4+34(6,.4+ 4 + & + &) 

+ (4q + 18p) +6+4 (6, + & + & + &) 


respectively. The differences of these two are independent of g and p. 


25.40. The alternative procedure in allotting treatments would be to distribute 
them at random. Such balanced arrangements as (25.30)-(25.32) might then arise by 
chance. But we might also get such an arrangement as 


AAAABBBB . «§ . « |g 


What are we to do in such circumstances ? If we reject this arrangement we are rejecting 
the random allocation of treatments in favour of systematisation. If we accept it we 
know quite well that a fertility gradient, if it exists, will invalidate the inquiry. 

The reader will no doubt agree that, if other things are equal, the balanced arrange- 
ment is better than the arrangement (25.33). What we have to examine is whether other 
things are equal; in short, whether in rejecting randomisation we have lost anything 
useful in the testing of significance. 


25.41. Consider a rather more general case in which an experimental area is laid 
out in p blocks of ¢ treatments each. If the subscript j refers to blocks and k to treat- 
ments, we have the usual analysis with sum of squares between blocks (p — 1 d.f.), between 
treatments (g — 1 df.), and residual ( (p — 1) (q — 1) df.). 

Now we have seen that if the individual plot-yield can be regarded as a block effect 
plus a treatment effect plus a normal residual with constant variance from plot to plot, 


DESIGN AND RANDOMISATION 265 


the significance of treatment effects can be judged from the z-test in the usual way by 
comparing sum of squares between treatments with the residual sum of squares. This 
is true whether treatments are allocated at random or not. 

But suppose we wish to adopt the alternative viewpoint of 23.41 and make the infer- 
ence in the set of values obtained by permuting the observed values. These permutations 
will not affect the block means or the total mean, and hence the sum of squares between 
blocks remains constant. The remaining part of the analysis may be written— 


Sum of Squares. d.f. 
Treatment 5 6 | Sh =] 2 ayy = eee gest 
Residual . . . | 8, =Z (ee — a. —eet+z..)?| (p—1)q@—1) » (20.34) 
Torars . . | S,; = Z (xj~ — 2;,)? pig =) 


Rather remarkably, the z-test holds for the ratio 
8. (p-—1)q—-)) 
Vie 2 2 , 


provided that treatments are allocated at random, independently of the distribution of 
residual effects in individual plots. 


25.42. Consider, then, the population of values, (q !)?~! in number, obtained by per- 
muting the observed values. The total sum of squares 8; in (25.34) is the same for all 
members. Consequently if S, is too great, S, must be too small and vice-versa ; and in 
general, if we confine ourselves to certain layouts and reject others, all the possible values 
of S, cannot appear. It is this fact which has been seized on by advocates of randomisa- 
tion. They point out that for balanced layouts S, tends to be smaller than for random 
layouts (a conclusion supported by experiment) ; consequently that the test of significance 
is invalidated and the estimate of error S, too big. The difference between the two modes 
of thought may be expressed briefly in this way: with balanced layouts the real error is 
reduced but the estimate of error is too large, so that the significance of the result is more 
in doubt ; whereas with random layouts the estimate of error is exact but the error itself 
may be larger. The question is whether one prefers to be nearer the truth without knowing 
how near, or farther from the truth with a knowledge of the limits of error. 


25.43. For details of the controversy on this topic the reader may consult the papers 
referred to at the end of the chapter. It brings into prominence an important question 
of inference which can only be decided by the experimenter himself. If he chooses to 
regard any act of experimentation as one of a large population of such acts, to be carried 
out by himself or other workers, he may prefer randomisation in all circumstances, not- 
withstanding that every now and again he will hit by chance on a design which he knows 
is likely to give misleading results. But if he cannot take this very detached attitude (and 
most experimenters, being human, would think it poor compensation that their own errors 
are balanced by the better luck of other people) then he will prefer to design a balanced 
layout, even if the exactitude of his tests of significance is impaired. 


266 DESIGN OF SAMPLING INQUIRIES 


25.44. We must, however, not leave the reader with the impression that the 
desiderata of both schools of thought are totally incompatible. It frequently happens that 
one can select a design which is both balanced and random. The Latin square is a good 
example. By imposing the restriction that a treatment must not appear more than once 
in a row or column we remove to some extent the interference of fertility gradients; by 
requiring that it shall appear just once we balance the design; and by leaving the rest 
of the layout to be determined by a random selection from all possible Latin squares of 
that order we randomise so as to reproduce the distribution of the variance ratio in the 
required form, thus, as “ Student ’’ remarked, “‘ conforming to all the principles of allowed 


witchcraft ”’. 


REFERENCES 


A classical case of how an inquiry can be spoilt by poor design is the Lanarkshire Milk 
Investigation, for which see ‘“‘ Student’ (1931c) and E. M. Elderton (1933). This case 
will repay study. On some theoretical problems arising from the sampling of existent 
populations see Bowley (1925), Jensen (1925), Sukhatme (1935), Neyman (19336, 1934, 
1938a, 1939a, 1941b), Olds (1939, 1940), and Frankel and Stock (1939). The war has 
accentuated many of the points remaining unsolved, and there is much of general interest 
in recent issues of the Journal of the American Statistical Association and the Annals of 
Mathematical Statistics. For some work on the “ pilot ” sampling technique see Sukhatme 
(1935) and C. Bose (1948). 

Reference has been made in the text to Fisher’s Design of Hxperiments, Yates’ Prin- 
ciples of Orthogonality and Confounding, and Goulden’s Methods of Statistical Analysis. 

For the problem of size of sampling units see the papers by Neyman referred to above, 
particularly 1934, and for its effect on correlation analysis see an interesting appendix 
in Wold’s Analysis of Stationary Time Series. 

For the controversy on balance versus randomisation see “ Student ”’ (1938), Barbacki 
and Fisher (1936), E. 8. Pearson (19376, 1938), and Jeffreys (1939¢). 


EXERCISES 


25.1. A population is given by specifying the frequencies in comparatively narrow 
ranges of one variate, the frequency in the ith range being NV, and ranges being of equal 
width. Show that if the population frequencies are large, the best estimator of the mean 
of a second variate which is linearly related to the first (in the sense of the unbiassed estimator 
of minimum variance) in a sample obtained by taking n,; members from the ith range is 
given when n, is proportional to N,;. 


25.2. Extend the result of the previous exercise to the case where ranges are of 
unequal width. 

If the number of farms in England and Wales is known in the acreage ranges 0-49, 
50-99, 100-199, 200-499, 500 and over, what sampling proportions would you take in the 
various ranges to estimate the total acreage under wheat ? 


EXERCISES 267 


25.3. Ifa variate € can be regarded as the sum of a systematic component & (x) and 
an uncorrelated random component ¢, and 7 similarly as 7 (x) + ¢,, and if the random 
components are uncorrelated with each other, show that 


cov; é (x), 7 (x)} 
{ (var & (w) + var e,) (var n (x) + var oa 


nits, 7) a 


Hence, if a population is divided into strata the correlation between & and 7 for these strata 

will, in general, be less than that obtained by combining strata to obtain larger units ; 

and as the strata are further subdivided the correlation between € and 7 tends to zero. 
(Spearman, 1907, Am. J. Psych., 18; Wold, 19382.) 


25.4. Illustrate the effect of the foregoing exercise by calculating the correlation 
coefficients for the data of Table 14.4 (vol. I, p. 333), (a) by adding the variates in pairs 
and so obtaining 24 values; (b) by repeating the operation and obtaining 12 values ; 
and (c) by repeating the operation and so obtaining 6 values. 


25.5. (Markoff’s theorem.) Consider a sample of independent values z,... %,, 


x; being drawn from a population /7, with mean yw; and variance o?7. Suppose we have 


a function 6 defined by 
oS »> b; Pj 
j=1 


where the 6’s are known and the parameters p, depend on the y’s according to the equation 
s 


= Ay; Pj» 8 <n 


j=1 


the a’s also being known. Then an unbiassed estimator of 6, say ¢, with minimum variance 
may be written— 
nr 
j=1 


Show that the function ¢ is given by substituting for the p’s in the expression for 0 the 
functions q given by minimising 


n 1 s 2 
Sale- evo 
i=1 7 j=) 


with regard to the q’s considered as independent variables. 
Show further that if this minimum value is S, the estimated variance of ¢ is 


Bee ye a). 


n—s8s 
25.6. Ina feeding experiment there are given five different foods, each of which is 
available in four grades. It is desired to feed each animal with one grade of each food, 


but only one, so that a comparison may be made of the effect of the different grades of any 
particular food. Use the Graeco-Latin square to show how the feeding can be carried 


out. 


268 DESIGN OF SAMPLING INQUIRIES 


25.7. A water diviner is to be taken to ten spots and asked to say whether water 
is present below the surface. It is decided to choose five spots where water is known for 
certain to exist and five where it is known not to exist. The order in which the spots are 
to be presented is determined by spinning a coin, heads denoting water and tails not-water. 

The spinning of the coin results in the first five trials giving heads. Would you 
accept this result or spin again ? 


25.8. Show that a Latin square may be regarded as a three-way classification in 
which »? members are not zero, but p* — p* members vanish. Derive the analysis of 
variance for the Latin square from this approach and generalise it to the Graeco-Latin 
square, 


CHAPTER 26 


GENERAL THEORY OF SIGNIFICANCE-TESTS—(1) 


Hypotheses to be Considered 


26.1. The kind of hypothesis which we test in statistics is more restricted than the 
general scientific hypothesis. It is a scientific hypothesis that every particle of matter 
in the universe attracts every other particle, or that Homer was blind; but these are not 
hypotheses such as arise for testing from the statistical viewpoint. A review of the various 
tests which have been introduced earlier in this book indicates that the great majority 
specify something about a population. Some merely assert a general fact such as “ the 
population is continuous ” or “ the population is rectangular’. Others are more definite, 
as for instance “‘ the population is normal and has a mean uw,” ; and again others are less 
definite in one direction and more definite in another, e.g. ‘“‘ the population has unit vari- 
ance’. It is also usually a part of the hypothesis that the sample from which the inference 
is being made was obtained by a random process. 


26.2. Suppose we have a set of random variables 7, ...2,. In the sample space 
W of m dimensions the sample-point whose co-ordinates are 7, . . . x, determines a point 
E, say, with a distribution function which we may write as P(#). If w is any region in 
W, we may derive the probability that H falls in w, say P (Hew). Then we shall say that 
any hypothesis concerning the law P (H «w) is a statistical hypothesis. If it determines 
the law completely we shall call it simple. In the contrary case it is said to be composite. 

For instance, in testing the significance of the mean of a sample of n, it is a statistical 
hypothesis that the parent is normal. This is composite, as also is the hypothesis that 
the parent is normal with mean yu or the hypothesis that the parent is normal with variance 
o*. The hypothesis that the parent is normal with mean uw and variance o? is simple because 
then the parent is fully determined. 


Example 26.1 

In sampling from a population dichotomised into classes possessing the attributes 
A or not-A, say in proportion w and y (= 1 — a), the sampling distribution is the binomial 
(y +o)”. This is completely determined by the value of aw, and hence a hypothesis as 
to the value of w is simple. Such, for instance, would be the hypothesis that male and 
female births occur in equal proportions. Similarly, in a multiple classification with pro- 
portions a, @, . . . @,, a simple hypothesis would specify values for all the o’s; if only 
one were specified and s were greater than two the hypothesis would be composite. 

In sampling from a bivariate normal population characterised by two means, two 
variances and a correlation, a hypothesis about any one parameter would be composite, 
and similarly for a hypothesis concerning two, three or four parameters. Only if all five 
were specified in addition to the normality of the parent would the hypothesis be simple ; 
and this notwithstanding the fact that the sampling distribution of the means is inde- 
pendent of the other three parameters, and that of the correlation coefficient independent 


of the other four. 
269 


270 GENERAL THEORY OF SIGNIFICANCE-TESTS 


26.3. A hypothesis which determines the law P(Hew) completely except for v 
parameters is sometimes said to have » degrees of freedom. Such a hypothesis may be 
regarded as an aggregate of simple hypotheses. For instance, the hypothesis that a popula- 
tion is normal with mean yp is the aggregate, for all o®, of hypotheses that it is normal with 
mean yw and variance o*. 


26.4. The kind of argument we have used in testing hypotheses, for both large and 
small samples, is of this character: assuming that the hypothesis is true, we can, with 
any assigned probability «, find a region w, in the sample space W such that the probability 
of E falling in W-w, is « We call W-w, the region of acceptance and the complementary 
domain w, the critical region. (This is the nomenclature of Chapter 19.) If our observed 
E falls in w, we reject the hypothesis ; if not we accept it. As a rule, in practical cases, 
our regions w, are determined by the values of some statistic such as % in testing the mean. 


Errors of First and Second Kind 


26.5. In general, as we saw in Chapter 19, there are many possible regions of accept- 
ance for any given hypothesis and any given probability level «. For all of them we shall 
err in proportion 1 — « of the cases in the long run by rejecting the hypothesis if # falls 
in the critical region—provided that the hypothesis 1s true. But what about the case when 
it is not true? We cannot ignore this case, for its possible existence is the very reason for 
carrying out the test. It is of no use whatever to know merely what the test will do when 
the hypothesis is true without regard to its behaviour in the contrary case; for if we are 
to consider only the events which happen when the hypothesis is true we have no right to 
use a test based on that assumption to reject it. 

By having regard to the behaviour of the test when the hypothesis is not true we are 
able to lay down criteria for choosing among the various tests obeying the rule 

PY Eh 6 wy | Ao ce ; ; - .« (26a) 
where HH, is the hypothesis. In fact we shall seek for the test which, while obeying (26.1), 
minimises the risk of accepting H, when an alternative hypothesis H, is true and H, accord- 
ingly is false. That is to say, we shall endeavour to find w, such that, in addition to (26.1), 


we also have 
1—P{Hew,|H,:}=minimum. . : . . (26.2) 


26.6. From a slightly different viewpoint we may say that there are two possible 
errors in judging a statistical hypothesis : 

(a) We may reject it when we ought to accept it, that is, when it is true. 

(b) We may accept it when we ought to reject it, that is, when it is false. 

These are known as errors of the first and second kind respectively. The error of the 
first kind we can control exactly by setting up the proper region of acceptance determined 
by «. Errors of the second kind cannot be controlled in this way, but we can sometimes 
calculate their probabilities, and in any case can try to reduce them to a minimum. This 
is the fundamental idea, first given explicit expression by Neyman and E. S. Pearson, 
which determines most of the work in the present and succeeding chapters. 


26.7. The possibility of finding regions of acceptance obeying (26.2) clearly depends 
on a precise specification of what alternative hypotheses are under consideration. We 
had better emphasise the importance of this point. It is customary to speak, and even, 


ERRORS OF FIRST AND SECOND KIND 271 


in a loose kind of way, to think of testing a hypothesis without reference to alternatives, 
To take the case of testing for normality, we often say that the hypothesis under test is 
that the population is normal without specifying what other form it might have. The 
reader may say that the alternative he has in mind is merely the negation of the hypothesis, 
namely that the population is not normal. But if so he will find it very difficult—in my 
own view impossible—to justify any of his tests on a logical basis. He will calculate certain 
statistics and accept the hypothesis if their values are consonant with the normal values ; 
but it will always be possible to find other populations for which the observed values are 
even closer to expectation. If agreement between theoretical and observed values is the 
criterion he should reject normality in favour of these alternative hypotheses. It is not 
until he specifies his alternatives and considers errors of the second kind that some firm 
foundation for intuitive processes begins to appear. 

26.8. Perhaps it may help to clarify the fundamental concepts of the present approach 


a 


ie 


Fie, 26.1 (see text). 


if we consider a simple illustration where the hypothesis under test H, is simple and there 
is only one alternative H, which is also simple. In Fig. 26.1 we show diagrammatically the 
scatter of sample-points which would arise in samples of two, x, and 2,, the cluster on the 
right being that due to H, and the one on the left to H,. In practice, of course, the sampling 
distributions are more usually continuous, but the dots will indicate roughly the condensation 
of sample density round central values. 

In determining the critical region we have to find an area in the (2,, x.) plane such that 
its “content” is 1 — «. Two possible areas are shown, w, being the area to the left of 
the line PQ, and w, the area between the lines AB and BC. In either case the proportion 
in the critical regions of the frequency on hypothesis H, is 1 — «, and if we reject H, when- 
ever the sample-point falls in wy (and similarly for w,) we shall commit an error of the first 
kind in proportion 1 — « of the cases in the long run. 

Consider errors of the second kind. By using the region w, we should reject H,—and 


272 GENERAL THEORY OF SIGNIFICANCE-TESTS 


therefore accept H,—every time the sample-point arose from H,, that is to say in practically 
all the cases where H, was true, since nearly all the sample-points arising from H, lie in 
w,. Errors of the second kind are therefore very rare. On the other hand, if we were to 
use w, we should accept H, every time a sample-point arose from H, but did not fall between 
the lines AB and BC, that is to say fairly frequently. Clearly w, is the better critical 
region and has a much smaller error of the second kind than w. 


26.9. It is to be noted that the argument does not depend on the relative frequencies 
of occurrence of the hypotheses H, and H,. This is generally true. There is no concealed 
form of Bayes’ postulate in this approach. 


26.10. When there are 7 variates and p unknown parameters the geometrical repre- 
sentation can be extended by imagining a sample-space W of n dimensions adjoined to 
a parameter space of p dimensions. We cannot draw a picture of such a case on a two- 
dimensional sheet of paper, but the geometrical imagery and terminology of the method 
are frequently useful. A graphical illustration of a two-dimensional sample-space and 
a one-dimensional parameter space has already been given in Fig. 19.3. 


The Power Function 
26.11. If for a simple hypothesis H,, (26.1) is true we define 
P{Hew,|H,} =8 (A, | wv) ; : - . (26.3) 
as the power of the critical region w,) with respect to H,. Clearly the power is greatest 
when the probability of an error of the second kind is least. 

In the expression on the left of (26.3) we regard the probability that H# falls in wy as 
dependent on H,, the hypothesis alternative to H,. In the expression on the right we have 
regard to the power of the test for H, as dependent on wy. 

If there exists a particular region w, with greater power than any other region obeying 
(26.1) we shall say that it is the best critical region, and the test based on it will be called 
the most powerful test. 


26.12. We proceed to consider in turn the following cases :— 

(a) H, simple; one alternative H, which is simple. 

(6) H,simple ; an alternative H, which is composite but can be regarded as an aggregate 
of simple alternatives. 

(c) H, and H, composite but expressible as aggregates of simple hypotheses. 


Simple Hypotheses: One Simple Alternative 


26.13. Suppose the parent population is continuous, so that the simultaneous dis- 


tribution of the n sample values 2, . . . x, is continuous ; and let the frequency functions 
of the sample values on hypotheses H, and H, be p, (a, . . . %) and p, (a... 2) respect- 
ively. Write dx for the element dz, ... dz,. Then we have 

pode =1l—-«a : : : : . (26.4) 


We 


and wish to maximise, for variations in the domain w,, the integral 


Pp: dz. : : ; » (26.5) 


We 


SIMPLE HYPOTHESES: ONE SIMPLE ALTERNATIVE 273 


This is a problem in the Calculus of Variations and is equivalent to maximising uncon- 
ditionally the integral 


1 
| (2: = iP) da, . : A : - (26.6) 
or, what is the same thing, to minimising 
| (po — kp:) da, ; : : 7 (26,7) 


where k& is a constant to be determined by (26.4). 
It is known that the condition for a stationary value of (26.7) is that, on the 
boundary of wW, 


po — kp, = 0. »  o- \ F (26;8) 
If the solution is a minimum we have, inside wp, 
Do kp, . : A . ‘ - (26.9) 
and outside ww, 
Po> kp, . ‘ c : : . (26:10) 


This solution to the problem is fairly obvious on general grounds. If U is a function which 
is sometimes positive and sometimes negative, with a line of demarcation where it is zero 


(as must exist in virtue of continuity), we clearly minimise { U dx by taking into the region 


w, all the points for which U is negative and no more. This gives us (26.9) and (26.10), 
and the boundary of w, is the locus for which U vanishes. By convention we regard the 
boundary as included in w,, which accounts for the equality in (26.9) and its absence in 
(26.10). 


26.14. The conditions expressed by (26.8), (26.9) and (26.10) are sufficient as well 
as necessary. For let w, be any other region for which 


pdx =1—.«4. 


Wy, 


If w, and w, have a common part denote it by w.,. Then 


| pods =1—a—| Po dx 
Wo Wor . Wo1 


aaa | Po da 
Wi Wor 
and hence, from (26.9) 
i | pide > | pode = | Pp, dx 
Wo-Wo1 Wo—-Wor W1—- We 
>& p, dz. 
Wi Wor 


Adding to both sides k p, dx, we have 


Wor 
k\ p,dx> k| pidu, . ; “ : . (26.11) 
Wo w 
and hence, for positive k, the power of w, is less than that of w, and the latter is the best 
critical region. 
A.S.—VOL. II. Tt 


274 GENERAL THEORY OF SIGNIFICANCE-TESTS 


Both in this section and implicitly in the last we have required k to be positive. That 
it must be so if w, is to exist emerges from (26.8), for p, and p, are essentially not negative, 
and if k were negative no solution for real variate-values would exist. 


Example 26.2 


Consider the normal population 
] 
dF = ——— exp {— 4 (a — p)?} dz, —o <2 <0. 


Let the hypothesis H, be that u = a , and the alternative that 1 =a,. We have— 


1 nr 
05 = = exp{ — 1») (x; — oa 


a 


We can conveniently express this in terms of the sample mean ¢ and the sample variance 
s*, obtaining for the density function 


it n 
Po = ——exp | — = {(% —a,)? + 87} J. = 
(2n)2 [-3 | 


A similar expression is found for p, and thus, for the boundaries of the best critical region, 
we have 


=P = exp|—5 (@— a)? — @ —a,)*} | 


= exp [= 5 (ap — ,)(2% — ay — a) |. 


This yields for the critical region 


or 


(a9 —a,)% <4 (@ —ai) + ~ log k = (@ — Q;) Xo, say. 


If a, <a, the region is then defined by 


but if a, > a, it is defined by 
ie 5 Pays 


The reader should compare the two cases on a diagram similar to that of Fig. 26.1. 


Example 26.3 
Consider again the normal population when the mean is known, say zero, but the 

variance unknown, e.g.— 

m2 

G2 


It 
= sym? (— ga) am — 00 <.% <0. 


SIMPLE HYPOTHESES: FAMILIES OF SIMPLE ALTERNATIVES = 275 


We now find, for hypotheses o = o, and o =o, 


pee eee On alee mies, | See 
Ps G) =) 2 ie aa 
which yields, for the best critical region, 


PA pa 
(2? + 88)(02 — 02) < aes log | (2)"t 


— <v (a2 — oi), say. 
Thus our critical regions are defined by 
Ms = #2 +82 <v ik oO, 0s 
m, =#2+s%* > if; 0, 106 
The best critical regions in the space W are thus bounded by hyperspheres centred at the 
origin. Whether we take the space inside or the space outside a particular hypersphere 
as the critical region depends on the alternative hypothesis. The probabilities concerned 
can be evaluated directly without evaluating the constants & and v. In fact, the proba- 
acd nv n (%* + 8?) ae : . 
bility of exceeding a given value of as 3 = 75 is obtainable from the y?-dis- 

0 0 
tribution with nm degrees of freedom, and hence the relation between v and « can be 
ascertained from the y?-integral. 
In this particular case we may find without difficulty the power of an alternative test 
(n —1)v’ 


which would suggest itself on intuitive grounds. Suppose we find — yj} from 


the y?-distribution corresponding to n — 1 degrees of freedom and probability level «, 

and use, instead of the hyperspheres centred at the origin, those centred at the sample mean 
Sa, sy. 

Suppose that the alternative H, is that of = 1-1 of In testing H, for the alternative 

o, > 0, we should, for the test based on », find y? and accept oy if 


NM» 


2 
sa 10: 


uU 
For instance, with n = 5, 1 — « = 0-01 we find y? = 15-086. The probability of an error 
of the second kind is 
xo2/1'1 
pide= |" dF (y’, 
0 


We 


2 
i.e. is obtained from the y?-integral with argument = 13-71, giving 6 (H, | w.) = 0-018, 


On the other hand, had we used yj instead of v2 we should have entered the table with 
four degrees of freedom, giving 13-277. Divided by 1-1 this gives 12-07, resulting in a 
probability of rather less than 0-017. This is the power of the second test and is lower 
than that of the first test, as of course it must be since the latter has maximum power. 


Simple Hypotheses: Families of Simple Alternatives 

26.15. Consider now the case where H, is simple but H, is composite and consists 
of a family of simple alternatives. The most frequently occurring case is the one in which 
we have a class of simple hypotheses 2 of which H, is one and H, comprises the remainder ; 
for example, the hypothesis H, may be that a mean has some value jy. and the hypothesis 
H, that it has some other value unspecified. 


276 GENERAL THEORY OF SIGNIFICANCE-TESTS 


For each of these other values we may apply the foregoing results and find for each « 
corresponding to any particular member of H,, say H,, a best critical region w, But this 
region in general will vary from one H, to another. We obviously cannot determine a 
different region for all the unspecified possibilities and are therefore led to inquire whether 
there exists, among the family of best critical regions w, one which is the best for all of 
them. Such a region is called the Uniformly Most Powerful and the test based on it the 
Uniformly Most Powerful test, conveniently shortened to U.M.P. test. 


26.16. Unfortunately, as we shall find below, the U.M.P. test does not usually 
exist unless we restrict our family 2 in certain ways. Consider, for instance, the case 
dealt with in Example 26.2. We found there that for a, < a, the best critical region for 
a simple alternative was defined by 

; B<H,. 
Now the boundaries of the regions determined by % = constant do not depend on a, and 
can be found directly from the sampling distribution of when the probability level 1 — « 
is given. Consequently the regions defined by  < &, are the same for all a, < a, and hence 
the test is U.M.P. for the class of hypotheses that a, <a . It is difficult to see how a better 
test could be devised, for, whatever a, subject to a, < a», the test controls errors of the first 
kind and minimises those of the second. 

However, if a, > a, the best critical regions are defined by ¢ >x,. Here again, if 
our class 2 is confined to the values of a, greater than a, the test is U.M.P. But if a, can 
be either greater or less than a, no UMP. test is possible. The reader will easily verify 
for himself that the same is true for the test considered in Example 26.3. 


26.17. We now show formally that for a simple hypothesis depending on 6,—the 
value taken by the parameter 6 defining a family of alternatives—no U.M.P. test exists 
for both positive and negative values of 6 — 0, if the frequency function p (# | 6) is con- 
tinuous, has everywhere a continuous derivative with respect to 6 which does not vanish 
identically, and admits of differentiation under the sign of integration over W. 

Suppose that such a test does exist. Then for any 6 we have, inside wy 


< kp, 
which we may write 
p (EL | 0) >h (8) po (E | 9%). : ; : . (26.12) 
Likewise, for any point # on the boundary of w, we have 
p (# | 6) =h (6) po (E | 6,). : 5 : ) (26.13) 


By hypothesis p is differentiable in 6 and hence so is h. Moreover, as 6 > 6,, h (@) > 1. 
Hence if 


A=0—4, 
and primes denote differentiation with respect to 0, we have 
h(9) = 1+A[h']o 49a 0<q<il 


- | ey 
06 po (EH | Ao) 10,404 


S [p’ (EB | A)Jo,+aa ° ° ° - (26.14) 


i, 4s 
Po (E | 90) 


I 


BEST CRITICAL REGIONS AND LIKELIHOOD 277 
Further we have 
P(E | 6) = po(H| 6.) + A[p' (E|9)lnrr O<r<1. . . (26.15) 
Substituting in (26.12) from (26.14) and (26.15), we find 


{ [p’ (E | )Joren — Po 1%) 


Po(B|0,) (E | 6) hve} 20 . . (26.16) 


This is true for any E and E and for all A, whatever its sign, and hence the expression in 
curly brackets vanishes. Thus we have 


[p’ (E | 6) Jo, Saran (Z| 6) I, =0. é 5 (26.17) 


Similarly this equation may be shown to hold outside w,, and hence it is true throughout W. 


Now we have 
aed 


[ 210) de =1, 
WwW 


and hence, differentiating with respect to 6 and putting 0 = 0), 


| [p’ (E | 6) }p, de =0. 
w 


Substituting from (26.17), we have 


Cae, 
pe) E19) la de = 0, 
and hence 
[p' (B | 6) Jo, eer, 
= es ee rr 1) 
Thus, from (26.17) 
[p’ (E | 9)]o, = 0. ee ee 


But this implies that the derivative of p with respect to 6 is identically zero at 0), which 
is contrary to hypothesis. The theorem follows. 

It may be noted that in deriving (26.17) from (26.16) we used the property that A 
may have either sign. If it can have only one sign, that is, if our class of admissible alter- 
natives is confined to the case when either 6 < 6, or 0 > 69, a U.M.P. test may exist ; and 
so we found in Examples 26.2 and 26.3. 


Best Critical Regions and Likelihood 

26.18. Since on the boundary of a best critical region we have p, — kp, = 0, that 
boundary is determined by the condition that on it the ratio of the likelihoods of two 
functions corresponding to H, and H, is constant. 

Consider now the case where H, comprises a set of alternatives varying according to 
the parameter 0, H, being one of them. In accordance with the principle of maximum 
likelihood we should obtain, as the most likely value of 6, the solution of 


op 
oe = 0, . e . ° ry 24 aD 
Gi ie yey) 


278 GENERAL THEORY OF SIGNIFICANCE-TESTS 


where § is then expressed as a function of the variables. If this value is substituted in 
p, we obtain the distribution with greatest likelihood which may’be written p (2 max.). 
The surfaces of constant likelihood are defined for this distribution by 


Po — Ap (2 max.) = 0. : : ; . (26.21) 
Now these surfaces are, in fact, the envelopes of the family, varying with 9, 
Po — kpp = 9, é : : : . (26.22) 


: ee AO oe 
for to obtain the envelope we differentiate with respect to 0, giving = 0 and eliminate 0, 


leading back to (26.21). Thus, if there exists a best critical region (and hence a U.M.P. 
test) for all permissible alternatives H,, such a region will be the envelope with respect to 
such alternatives and will therefore be identical with a region defined by (26.21); and 
hence a test based on the principle of likelihood leads to best critical regions, if they exist. 

If, as is more usual, there is no common best critical region, the ratio of the likelihood 
of H, to that of any particular H, is k. The surface (26.21) remains the envelope of the 
family of surfaces (26.22) for which k = 4. 


Example 26.4 


Consider once again the normal form, where both mean y and variance o? are specified 
and the admissible alternatives are that they can have any values, subject of course to the 
variance being positive. For any given yu, and o, the best critical region will be given by— 


Peal iid) a) 
Pi 09 Og (ont 


or 


This may be written in the form 


oF = 5 2 
na { (& — p)® + s?} > constant 
1 
where 
_ Ho OF — 11 95 
P 2 2  ° 
0; — 99 


Thus, if o, > 0) we have 

(@ — p)* +s? >0%, say; 
and if o, <o, we have 

(@ — p)? + 8% <0? 

For any specified , and o, the best critical regions are bounded by hyperspheres with radius 
v/n and centre ata, =a, =...=2%, =p. Owing to the fact that p varies with fy, and 
o,, there will not in general be a best common critical region and a U.M.P. test; and this 
remains true even if we limit our alternatives to o, <o, and by Soe by similar 
inequalities. 


We may regard % and s as independent variables and represent the data on a two- 
way plane (%, s). The best critical regions are then seen to be bounded by circles with 


BEST CRITICAL REGIONS AND LIKELIHOOD 279 


centre (p, 0) and radius ». Fig. 26.2 (adapted from Neyman and Pearson, 1933c) illustrates 
some of the contours for particular cases. A single curve, corresponding to a single proba- 
bility level, is shown in each case. 

Cases (1) and (2): o, =o, and p= + o. The best critical region lies on the right 
of the line (1) if ~,; > mw and on the left of (2) if {1 <fo. This is the case discussed in 
Example 26.2. 

Case (3): 6; <0, say o, =4o,. Then p=, + +4(u, — m) and the region lies 
inside the semicircle marked (3). 

Case (4): o, <o, and uw, =f. The region is inside the semicircle (4). 

Case (5): 0; > 0) and uw, =p. The region is outside the semicircle (5). 

There is evidently no common best critical region for these cases. The regions of 


(u,, 0) J b& 
Fia. 26.2.—Contours of Constant Likelihood in a Two-dimensional Case. (See text.) 


acceptance, however, may have a common part, centred round the value (5, 0), and we 
should expect them to do so. Let us find the envelope of the best critical regions, which 
is, of course, the same as that of the regions of acceptance. The likelihood ratio is 


caf 1 \ ns2 Pel _n L — flo aye ae | 
eee) ga a) il) (=): 
The partial differentials with respect to “, and o, equated to zero give 


eed Cored ead! 
O41 


~, (& — ms) = 0, 
oO; 


whence we find 4, = % and o, =s and the envelope is 


2H 1 (2\' 4 8 
1 — Flog k = (=> **) oz (=) + 


280 GENERAL THEORY OF SIGNIFICANCE-TESTS 


The dotted curve in Fig. 26.2 shows one such envelope. It touches the boundaries of all 
the critical regions which have the same likelihood-ratio k. The space inside may be 
regarded as a “ good”’ region of acceptance and the space outside accordingly as a good 
critical region. 

There is no best region for all alternatives, but the regions determined by envelopes 
of likelihood-ratio regions effect a sort of compromise by picking out and amalgamating 
parts of critical regions which are best for individual alternatives. 


Example 26.5 


In the previous example we have supposed that the sample space W was the same for 
all admissible alternatives. This is quite legitimate, for we can always regard the domain 
of variation as infinite by supposing that » = 0 outside the range of the frequency-distri- 
bution of the variates. In the normal case, of course, p does not vanish anywhere, so that 
we are compelled to consider W as infinite. 

When, however, the sample-space for non-vanishing p is bounded, special circum- 
stances may arise, and it is occasionally necessary to consider separately the different 
discriminating regions. For instance, if the sample-spaces corresponding to H, and A, 
are W, and W,, it may happen that W, and W, have no common part when both p, and 
, are greater than zero. If so, we can distinguish between H, and H, with certainty. 
If there is a common region W,, then W, — W,, should be included in the best critical 
region, for to do so reduces the probability of errors of the first kind. But it does not follow 
that this should constitute the whole of the critical region, for we might then commit too 
many errors of the second kind, i.e. accept H, too often when H, is true. We may then 
wish to add to W, — W,, a region Wo, making wy, altogether, such that wy, lies inside W,, 
and po (H & Wo) = Po (H Ew) = 1 — «. This controls the first kind of error to level « 
and reduces the second kind of error. 

Consider the population 


1 . 
p(t)=7, a— 3b <x <a+ bb 
= 0, elsewhere. 


Suppose a sample of n to have been drawn from a population of this kind where b is known. 
We wish to test whether a has some value a, as against the alternative aj. 

The sample-spaces W, and W, are hypercubes centred at a) and a,. If they have 
a common part W,, the probabilities p, and p, in that part are both proportional to the 
volume and p,/p, = 1 everywhere in the region. If, then, we take any region wy, of con- 
tent 1 — «in W,, and add it to W, — Wo, we get a best critical region, and there are clearly 
infinitely many such. 

For the admissible alternatives a, the hypercube W, will move along the long diagonal 
% = %, =... = %, aS a, varies, and we cannot always find a common region of size 1 — « 


to form wy). By taking such a region as a hypercube of side b (1 — «)’, however, fitted 
into one of the corners of W, lying on the long diagonal, we “ nearly ” obtain such an object 
since this region provides what is required so long as W, and W, have a common part of 
content 1 —«. Which corner we choose depends on whether the hypothesis is a, > do 
Or Ap > Gy. 


RELATION BETWEEN U.M.P. TESTS AND SUFFICIENT ESTIMATORS 281 


Relation between U.M.P. Tests and Sufficient Estimators 

26.19. It was thought at one time that the existence of a set of U.M.P. tests for 
a continuous range of admissible alternatives involved the existence of a sufficient estimator 
for the parameter concerned. This does not appear to be true in full generality, but is 
so in nearly all the cases occurring in statistical praetice. We will prove a theorem on the 
subject :— 

If a system of U.M.P. tests exists and if any point in the sample-space lies on the 
boundary of a best critical region, then a sufficient estimator exists for the parameter whose 
variation provides the admissible alternatives.* 

It is enough to show that for an arbitrary point we have 


Pi(Z) =A, 6)p,(#) . : : : 7 (26.23) 


for then ¢ is sufficient for 6 by definition. Now we know that on the boundary of a critical 
region we have 


where h varies with the a’s and with 6. We show that hf has the form A (é, 6) by defining 
a function ¢ and showing that if t has the same value at any two points #, and H,, then 


Pp: (E,) we (Z,) 
Po(H:) Po (EH) 


for all 6. 


26.20. For this purpose we require a lemma to the following effect : if a set of U.M.P. 
tests exists, it will be said to be ordered if the condition «, > a, implies that the critical 
region w («,) is included in the region w («,); and if a set of U.M.P. tests exists but is not 
ordered we can always find another set which is. 

w (x,) and w («,) may include parts of W where p vanishes. Let the remaining parts 
be v («,) and v («,) and, if v) is the common part of these regions, write 


v (a) oe 

V (%2) = Vo + Vv” 
where v, v’ and v” have no common points. Now for any value of 6 and for any E in w («,) 
—and therefore in v’—there is an A, such that 


pi (EZ) > h, po (#) in v' 
<h, p, (#) outside, and therefore in wv”. 


. (26.24) 


Similarly, within w(«.) and hence within v’” we have an h, such that 
pi (LE) > ha po (#) in v” 
<h, po (EH) in v’. 
It follows that, from the inequalities deriving from wv”, h, >h., and similarly, from v’, 
h, >h, Hence hi =h, =h, say, and 
pi (LE) =h py (E) : : : : . (26.25) 
within v’ and v” for any 6. 
* The theorem remains true if there is a set of points of measure zero for which the condition as to 


boundaries is not fulfilled. It is also true for several parameters, as may bo seen by an easy generali- 
sation of the argument. See Neyman and Pearson (1936a). 


282 GENERAL THEORY OF SIGNIFICANCE-TESTS 


Now take 
U (a1) = % +0” ; {eee , . (26.26) 
such that 


| podx=l—a. . . . . . (26.27) 
Uc) 


This is always possible, for the integral of po over Uy + v” is 1 — a, which is greater than 
1—«a,. It follows from (26.27) and the first equation of (26.24) that 


| Po dx =| pode. . ° ° - (26.28) 
Oil v 


Now put 
w’ (a:) = Wo + u(o1) = Wot +”, 


where W, is the part of W for which p, = 0. Then from (26,27) 


| pdx =1—a%. 
w’ (a) 


Further, w’ («,) is a best critical region with respect to admissible alternatives, for (26.25) 
and (26.28) imply that 


| p, dx =| p: ax, 
tee vw 


| pide = | p, dx. 
w (c) v (ot) 


Finally, w’ («,) is wholly included in w («,). 
We have therefore replaced the region w («,) by another region w’ («,) with the same 
properties except that it is included in w («,). The lemma follows. 


and hence 


26.21. To return now to the main proposition, let H be any point of W. If it belongs 
to only one boundary of a best critical region with content 1 — « we put ¢(/#) =1—«. 
If it belongs to more than one, we put ¢(#) equal to the mean between the upper and lower 
bounds of values of 1 — « for which the boundaries include EZ. In virtue of the lemma, 
this implies that whatever the value of 1 — « between these bounds, the corresponding 
boundary must contain £. 

Thus ¢ is defined everywhere. Further, if it has the same value at two points £, and 
E£, these points must lie on the same boundary. It follows that on this boundary 


Pp, (E;) med (£2) 
Po(Hi) = po (L2) 
and hence the theorem is proved. 
The converse is not generally true, but one has to exercise some ingenuity and import 


some artificiality to construct examples where it fails. Cf. Exercises 26.3 and 26.4. 


Composite Hypotheses 
26.22. We shall consider a class 2 of admissible hypotheses depending on r + 5 


parameters 0, ...6,...6,,, and shall regard the hypothesis H, under test as one of 
this class. A composite hypothesis of r degrees of freedom is one for which s of the para- 
meters, say 6,,,; ...6,,,, are specified, the hypotheses determining the distribution 


apart from the unspecified parameters. For example, the hypothesis that a population 


COMPOSITE HYPOTHESES 283 


is normal with specified mean, nothing being supposed about the variance, is a composite 
hypothesis of one degree of freedom. It will be assumed that any admissible simple alter- 
native is given by specifying the other r parameters 6, . . . 6, and that there is a common 
sample-space W for all such alternatives. 


Regions Similar to the Sample Space 


26.23. In order to test the composite hypothesis H, we need in the first place to 
control errors of the first kind by determining a critical region w, such that 


| po dt =1 —a«. : Oe , . (26.29) 


This, however, differs from the simple case in that p, can vary according to the unknown 
parameters, and to be certain of controlling the error we must be able to find w such that 
(26.29) is true whatever 6, .. . 6, If this can be done we shall call the region w similar 
to the sample-space W and shall speak of 1 — « as its size. 

The problem of testing composite hypotheses then becomes one of (a) finding the 
similar regions, and (b) selecting from among those regions the one which minimises the 
second kind of error for a simple admissible alternative H; If this is the same for all 
H, we shall have a common best critical region. 


26.24. We consider in the first place the composite hypothesis with one degree of 
freedom. The general problem of finding similar regions in such a case has not been solved, 
but a solution is possible in one important class of case, namely, that for which 

(a) po is indefinitely differentiable with respect to 6, for almost all values of 6,, 

(6) the function p, obeys the relation 

¢’ = A + Bd, ° e . . . ° ° (26.30) 
where 
a , __ Oh 
$ = 59, 108 Po Baa + + + + (26.31) 
and A and B depend on 6, but not on the x’s. In particular the normal distribution 
is of this type. 

Under conditions (a) and (0) it follows that for w to be similar to W it is necessary and 

sufficient that 


OF No 
d od 0, k = i 2 ee @ e . . ° 26.32 
. OO © ( ) 
Let w be a region for which (26.32) is true. Then for k = 1 and 2 we have 
j po ddx =0 


| Po ($* + ¢') dx = 0. 
In virtue of (26.30), this last may be written 
| po ($2 + A + Bd) dx = 0, 


whence 
| po bide = — A{ podx=—A(1—a). . . . (26.33) 


284 GENERAL THEORY OF SIGNIFICANCE-TESTS 


Differentiating (26.33) with respect to 6, and using previous results, we find 


| po di dx = (2AB— A')(1 —a), . ; 7; . (26.34) 


and generally 


| po di du =(1—a) yy, (01), » + «© «© . « (26.35) 


where y, (9;) is a function of 9, only, and is therefore independent of w. Now (26.32) is 
true for W = w, and we find 


ez o* dx = yy, (1), : ‘ : . (26.36) 
so that 


i) po St dee =| po $F de. : , ; . (26.37) 
1—aly Ww 


Now consider the random variable ¢. Since p, integrated through w is equal to 1 — a, 
Po 
1—«@ 
the moments of ¢ in this domain are the same as those of ¢ in W. Consequently, if the 

moments determine the distribution uniquely, the distributions of ¢ are identical. 

Hence we may use the hypersurfaces ¢ = constant to set up similar regions. The 
space W may be imagined as composed of shells of infinite thinness bounded by these 
hypersurfaces. If we determine an “ area”’ on one of these shells equal to 1 — « times 
its area in W, the totality of such areas will constitute a region w of size 1 — a; and since 
this will be so irrespective of 6, the region w is similar to W. 


we may regard as a frequency function defined in w. It follows from (26.37) that 


26.25. When similar regions are determined by the above method we have to find 
the best critical region from among them. Let H, be a simple admissible alternative. 
We require to find from the regions w a region w, such that 


| p,dx = maximum. . : , : . (26.38) 
Wo 


We now show that this is equivalent to maximising 


J, Bede (As ; : ; F . (26.39) 
subject. to 


| Po dw (¢) = (1 — a) | podW (¢). - . . (26.40) 
w (4) W (¢) 


Here w(f) means the element of w for constant 4—the “shell” of the previous section. 
The object of this is to reduce our present case to that of simple hypotheses. We take 
@ as a new variable and consider together the remaining variables (which amounts to deter- 
mining similarity of w and W in each separate shell between ¢ and 4 + d¢, as in the previous 
section), and are thus left with regions dependent on ¢. Equation (26.39) then requires 
that the probability of the second kind of error in each shell must be a minimum, subject 
to the control of the first kind asserted by (26.40). 


COMPOSITE HYPOTHESES 285 


Suppose that (26.39) were not maximised. There would then exist a set of values of 
¢ for each of which we could determine a region v (4) such that 


[_vedv(é)=(1—a)[ peas). (26.41) 
v (¢) W (¢) 


and 


p, dv (¢) > | p, dw (9). , ‘i F (26.42) 
v (9) Wy (%) i 


Let £ be this set of values of 6 and CEH the remaining set. We prove our result by obtain- 
ing a contradiction, namely by defining a region v which is similar to W, and such that 
| pdx>\| pde, . . «. «  « (26.48) 

v We 


which contradicts (26.38). 
Take as v the shells of hypersurfaces (1) in CE which are identical with w,(¢) and 
(2) in # which satisfy (26.42). Now 


and p,d2 = | as | Pp, Uw (d). 
We E+CE Wo (4) 


Hence 


forede-[ mde= [ast] wavs) —[ nraeacah 
= {4 { IL 4 Pde (b) — iL Pedi (p) >0, .  . (26.44) 


which is the contradiction required. 


26.26. Thus our problem is reduced to that of finding, in the shells ¢ = constant, 
portions w,(¢) which maximise the integral of p, We have, so to speak, brought the 
problem down one dimension by locating it in shells instead of dealing with it throughout 
the spaces w and W. It now becomes that of a simple hypothesis in (n — 1) dimensions, 
and the best critical region is the one for which 


1 
Pt = ZP . . e e ’ . (26.45) 


where ¢ is a function of ¢. The sum of these regions for the various values of ¢ gives us 
the complete solution to the problem, and if this sum has boundaries which are independent 
of H, we have a common best critical region and a U.M.P. test. 


Example 26.6: “ Student's”? Hypothesis 
A single sample is taken from a normal population 
nt _ @—w)) gy 
= FGA) exp { $ ma a 
with unspecified o. We have then one degree of freedom, 6, = o, and the hypothesis H, 
is that uw = Mo, say. 


286 GENERAL THEORY OF SIGNIFICANCE-TESTS 


We find 
_ Pio oot ee 
b= Zloep = —F += ae 
og _ 0 2 (x — fo)* 
ooo? a4 
_2n_ 3d 
— gg 6 : 
n S10 ae : “ 
=> aoe {(€ — po)? + $7}. 


Condition (26.30) is satisfied, and ¢ is constant over the hypersurfaces 
dX (% — fo)? = n {(& — wo)? + 87} = constant. 


The hypersurfaces are hyperspheres in W. To construct a similar region we have merely 
to pick out a region of size 1 — « on each shell and to amalgamate them. In our present 
case this is particularly easy because p, is constant over the shells and we need only pick 
out areas on each shell bearing to the area of the hypersphere the ratio 1 — a. 

These areas need not be of the same shape or similarly situated. By selecting them 
in different ways an infinite variety of regions may be constructed. We have to find the 
best for an alternative simple hypothesis o =o,, mw = [. 

The condition (26.45) becomes 


St 


i ae ae at oe aes as oe 

= exp | age t H4) +e}] > peer | Boa 1 Ho) +a} |. 

As we are dealing with regions which are similar with regard to o, we may put o = 0, 
and find 


- 7 1 
© (ur — Mo) > & (ue — wz) — A oj log k = (uy — Mo) ky, say, 


where k, = k,(¢). Thus we find, for the boundary of w, (4), 


if wi> fm, %«>k,(¢) 
if Hi < Mo» Z< ky (9), 


where k, has to be chosen so as to satisfy 


J radwid)=(1—a)| pedi (4). 
w (¢) W(¢) 


Thus on any particular shell the “ cap ” cut off by the hyperplane ¢ = constant must have 
area 1 — « and hence must subtend the same solid angle at the origin. Consequently the 
boundaries lie on a right hypercircular cone through the point whose co-ordinates are all 
equal to 4, and whose axis is perpendicular to = 0, namely the line z, =a, =...= Dae 
For each « there will be a different cone. If 4, > mu» the cones will be in the posi- 
tive quadrant and in the contrary case in the negative quadrant. 

Furthermore, these regions are independent of 4,. Thus for the class of hypothesis 
Hi > flo OF fy < fo (but not both together) the common best critical regions and U.M.P. 
tests exist. 

Finally we have to evaluate « in terms of the sample values determining the critical 


COMPOSITE HYPOTHESES: SEVERAL DEGREES OF FREEDOM — 287 


cones. We have already seen in Example 10.6 (vol. I, p. 239) that if z = ie the 


frequency inside the cone is 


I f. dz eds 
n—-1 1 a a 
B ee 2)3 
(5) ate) 
Thus “ Student’s ” test, which we have previously considered on more or less intuitive 


grounds, is now seen to be the best in the sense of the theory herein developed, for the 
admissible class 4, > mw or for that wu, < po. 


Example 26.7 


Consider a sample from the normal population with unspecified mean, the hypothesis 
being that go =o,. We now find 


n (x — 
¢ = — log p ah ( 2 3) 
99 
ae 
ou si” 


so that (26.30) is satisfied. 
The hypersurfaces ¢ = constant are the hyperplanes = constant, and any regions 


of size 1 — « on these hyperplanes will provide similar regions w. The condition p, > EP 


will be found to reduce to 
8? (of — 07) < — (& — wy)? (06 — of) + 205 of log 2 Ste “log x} = (05 — of) ky, say. 


If 0;> 0) we have s* >k, (¢) 
and if 0;<0) we have s? <k, (dq). 


Since s? is independent of z, k, will be a function of « and » only. The best critical 
regions are those given by s? > sj and s? <s% as the case may be, and the appropriate 
values of s, corresponding to « may be found from the known distribution of s*. The 
critical regions are hypercylinders, and again there are two sets of best common critical 
regions, according as o;> 6» Or 0; < 0». 


Composite Hypotheses: Several Degrees of Freedom 


26.27. As a preliminary to extending the theory for one degree of freedom to the 
case of several degrees, we note that if a region w is similar to W with regard to 6, .. . 6, 
jointly, then it is so for each of them separately ; and conversely. The direct result is 
obvious and the converse follows in this way: (we need prove rt only for r = 2 because 
the rest follows step by step). If then 


[ pae=1—a 


is true for 6,, 0; . . . 9, independently of 6,, and for 6,, 0; . . . 6, independently of 4, 
then it is true for any values of 6, and 0, and any other fixed values of 6, ... 6,; and 
hence it is true independently of 6, and 6, together. 


288 GENERAL THEORY OF SIGNIFICANCE-TESTS 


26.28. An additional preliminary requirement is the concept of independence of 
a family of surfaces of a parameter. Suppose j 


fy (G1. « - Bq, 0) = C; i=l ee j . (26.46) 
represents a family of surfaces, where 6 and the C’s are variable parameters. Let 
S (0, C, . . . C;) be the intersection of these surfaces, or, if k = 1, the surfaces themselves. 


Consider the family obtained by fixing 6 and allowing the C’s to vary. Then if any surface 
of this family for 6, can also be obtained from a second family for 6, we shall say that the 
family is independent of 6. We get the same aggregate of intersections however 4 is chosen. 
For example, if 

fi = (#1 — 8)? + (@, — 0)? + (4%, — 0)? =C, 
and fate Paros — Cs, 


the family S consists of circles in planes at right angles to the line x, = x, = x; and having 
their centres on that line. This is true however 9 is chosen, and S is therefore 
independent of 6. 


26.29. Under certain restrictive conditions similar to those of 26.24 it is now possible 
to find solutions to the problem of determining best critical regions. We assume 


He 
(1) that a exists almost everywhere for all k andj =1...7T; 


(2) that if ae - logy, and ¢; = 2%, 
j 


then ¢, =A; + B; 4,; , « (26:47) 


(3) that the family of surfaces given by the intersections of 4; = C, is independent of 
6; forg=1...7. 


Subject to*these conditions (which are sufficient but not necessary) similar regions exist. 
Consider any two surfaces ¢, and ¢,. Since w is similar with respect to 6, alone, we may 
find surfaces ¢, = constant and 


| p dw (¢,) = paw (¢,). ; . » (26.48) 
w 1) W ($1) 


In accordance with assumption (3), the family of surfaces ¢, = C, is independent of 6,. 
Thus if 6, varies, W (¢,) and w (¢,) will not vary, though perhaps they may correspond to 
other values of C,. Furthermore, (26.48) is true regardless of 6,. Hence within the shell 
¢, = constant we can repeat the analysis used for one degree of freedom. We find that 
the necessary and sufficient condition for w to be similar to W with regard to both 9, and 9, is 


\, Prose BUS) ot 2) | eas (bi, $2), + . (26.49) 


W (br ds 


where W is the intersection of 4, = C,, ¢. = C, for any values of C, and C,; and similarly 
for w. 

As before, the most general region w is obtained by amalgamating the portions of size 
(1 — «) on the intersections of ¢, and ¢,. The generalisation to r degrees of freedom is 


COMPOSITE HYPOTHESES: SEVERAL DEGREES OF FREEDOM ~ 289 


immediate. It also follows in the usual way that the best critical region is the one for 
which 


| p, dx > | p, ax, 
Wo v 
v being any other region of size 1 — a; and w, is defined by 


Pp>h(r...6,) po. - : , : . (26.50) 
The following examples will illustrate the theory. 


Example 26.8. Ratio of Two Variances 
Suppose we have two samples of 7,, 7, members from independent normal populations 
whose means and variances are unknown. The joint distribution may be expressed as 


ye 


Consider the composite hypothesis o, = o, =o, say. This has three degrees of freedom, 
for 44, 4, and o are unspecified. As the alternative H, we will take 


n a n fe 
exp [= 3g2 { (4 — fix)? + af} — a2 {(€2 — fa)? + } |. 


n Te 
om ons 


,= Pay 6, = fam fr = bi, 6, = 01, i,=—-, 
and for H, itself 
6, = HB, 0, = b, 65 = 6, 6, = Ve 


We have first to consider whether the conditions of 26.29 are satisfied. 
(1) Evidently p, is differentiable for all parameters any number of times. 


(2) We find— 


0 lt = a 
ee tee gO Oe) a (he = 8) 


= "i (@ —" —4) 


ee Z z 
ee = 2 log ps = a a a a {n, (Z1 — “)* + m2 (@. — w — 6)? + 1, Sf + 2 83} 


and (26.47) is seen to be satisfied. 
(3) The hypersurfaces ¢, = C, are evidently equivalent to 
Ne, + 2f, = Cj, 
where C, is an arbitrary parameter. The hypersurfaces ¢, = C, give similarly 
£, = C,. 
Both these are independent of 6, and their intersections, namely %, = constant, %, = con- 
stant, are independent of 9,. Thus the third condition is fulfilled and we may apply the 


foregoing theory. 
The equations ¢, = constant, ¢, = constant, ¢, = constant are equivalent to 
€, = constant 
#, = constant 
ns? + 7.83 = constant = (n; + mz) 82, say. 
A3.—VOL. I. 


290 GENERAL THEORY OF SIGNIFICANCE-TESTS 


The element w, is part of W (di, $2, ¢3) within which 
DP; > Po/h (1, Lo, Sq) 
and this condition, by reference to the frequency function, becomes 


= lee {n, )2 + ny 82} — salt (%. — #6) + ms)] 


omits 
h 1 = 2 -2 (2 -2 2 
S Gmtn, Gi exp ra (,— Ha)? + n, 8} + nOy? (% — fa + fa)? + 229; s5} . 


Since the region w is independent of uw, b and o, we may put them respectively equal to 
fy, 6, and o, and hence find for the condition 

My (1 — 63) {(%. — wa — b,) + 83} < 20} 62 (log h — n, log 94). 
Since this inequality holds good on #, = constant it contains only one variable s} and we 
accordingly find two cases :— 


|G e = 1 the best region is defined by s? > h, (#1, %2, 87); 


If 6, =°2 <1 the best region is defined by s2 <hj (#1, %:, 8%). 
O71 
We have now to determine h, so as to satisfy 
pode = (1 — a) | Po dx. 
Wo (b1r ba, $s) 


Now W (4,, Pr, ds) is the locus for which @,, €, and s? are constant, and thus the integral 
on the right is the product of 1 — « and the frequency function py (#1, #2, 82). Similarly 
that on the left is the integral of this function over the region for which s3 < h’. Thus 


hy” 
| Ghee — Po (1, Za, 82, 82) ds? in the first case, 
Ws hy’ 


I, ($1, $a, $s) 


with a similar expression but different limits in the second. Now we have for the joint 
frequency function of %,, #,, sj and s} 


i 1 = a = 
f o.@ om tm a ar exp [= 33 (Ts ot fy)? a. Ny (Zs —_ Pe de (m4 ete N) a a 
1 


Transforming from sj to s2 as variable, we find for the condition, after a little reduction— 


he m—3 ™m%—3 
| { (n, + 2) 6 — MN, Sz} 2 83 dsi— (1 — a) ie (M1 + Nz) Sy — Ne 83 Nae Sse ds, 
h' 
were eee ea bcentimeenie ( 2 find 
tore 2 Z N28 = (NM, + 72) 8, u we find— 
Ug’ n,—3 eS il ™m—-3 NM—3 — — 
be (1 ~u) 2 P du =| (l—u) 2? wt du=(1—a)B m pits ay 
0 Ue 2 2 


It follows that uw, wu, depend only on «, n, and n,. Thus, whatever the values of Ley 

and s%, the best critical region is defined by 

(t1 + Ma) 8 ,, 
Ne 


Sh = it oe 83 
2 
a= hy = ects 


U, ii G0; 
Ne 


COMPOSITE HYPOTHESES: SEVERAL DEGREES OF FREEDOM 291 


These are equivalent to 


9° 
Ns S35 
= 2 a —5 @ Uo ito, Oy 
M4 ST + Nez $3 
U <= Up if 01 = 02 
If we put 
nN, (n 1) 3 
opedigayye a il Seer ) 1 


the B-distribution of uw reduces to Fisher’s form. The result we have reached is therefore 
equivalent to showing that the z-test is the best for the ratio of two variances in normal 
samples. As usual, there is no U.M.P. test for the whole range of the ratio from 0 to ~, 
but two U.M.P. tests for the ranges 0 to 1 and 1 to © respectively. 


Example 26.9. Difference of Two Means 


Consider again the previous example, where now the variances are unspecified but 
equal and the means w, and “uw, = uw, + b may have any values. The hypothesis H, is that 
b = 0 and has two degrees of freedom corresponding to uw and oa. 

Let the alternative H, specify the parameters 


Gs — Hy 6, — O45 65 — b,. 
In addition to the quantities required in the previous Example we now use also #, and 


s2, the mean and variance of the pooled samples. 
We find that the three conditions of 26.29 are satisfied, and 


ee ree. — 1) + 65), 


o Gc 
Equivalent to this family are the surfaces 
e310; 
a Ce 
The condition p, > h(¢,, $2) p> reduces to 
b; (B, — #2) <h’ (%., 85), 
and as usual we find two cases according as fz, > “, or vice-versa. We consider only the 


first, the second being analogous. 
Writing v = %, — &, > k, we have to determine h’ by 


Fes?” : pv 
| m (@, #2, ») dv =(1—a) |p, (G #8, v) de, 
h’” Piped 


where h’”’ and h®” are the lower and upper limits of the variation of v for fixed values of 2, 
and s?. 
The frequency function of %, 82, v and sj is easily found to be 


a Ny hi Ma—3 Ny +n 

Fo om3f (ms + ma) ob — mot — BEBE oA YE exp | — STE (00 — malt + a] 
al 42, 

whence that of %, sf and v is found to be 


Ma +n.—4 n 5 
i x (si _ ree 0) 2 exp | — ae Sa Og teal) $) |. 
1 2 


292 GENERAL THEORY OF SIGNIFICANCE-TESTS 


Since # and s? are constant over the domains under consideration we have to satisfy 


hy” M+ny—4 nie n,tn—4 
{ oe Ny Ne *) 2 a =20 aay (3 — Go") 2 dv 
hee (nm + Nz)" 0 (ny “ Ne) 


where h a (my + Ne) So peal (ny, +f Nz) So 


V/(n, Ms) ” V/ (11 2) ; 


_ (m+ M2) So z 
i V(t 1s) (1 + 22)? 


this reduces to 


1 Zo” dz 
ln, +n, — 2 \ imei ~ a 
Ble, or a 0 ee 
% 2 
on = ie Ny, Ne 
and Np Sal SE) Pabst JF 
/(n, 83 + nz 8) NP + Ns 


We have thus arrived at the t-test for the difference of two means in normal variation when 
variances are equal. Once again the test we introduced on more or less intuitive grounds 
has been shown to be justified in the light of the theory developed in this chapter. 


Linear Hypotheses in Normal Variation 


26.30. Several of the hypotheses dealt with in foregoing examples are particular 
cases of a general class known as linear hypotheses, which accounts for the fact that we 
keep arriving at the same sort of conclusions respecting them. 

Suppose we have n independent variates typified by x, distributed in the normal form 


1 1 
Bm em | — ges 0 — 1)" de 


with common variance o? but different means. Suppose the means are connected with 
ry and s unknown parameters 0, ...06,.. . 9,,, by linear equations of the type 


Le = 2 Cz 6; . . ° ° . ° (26.51) 
j 


Suppose further that the hypothesis H, specifies r parameters 
0B, . .. 0 Se B,, 


and hence is composite with s degrees of freedom. Then H, will be called a “ linear 
hypothesis’. The reader can verify for himself that ‘“‘ Student’s”’ hypothesis, and the 
hypothesis as to the difference of two means when variances are equal, are of this type. 
The homogeneity test in variance-analysis and the test of regression coefficients are also 
reducible to the same form. If, of course, H, specifies r linear relations among the 6’s 
instead of the 6’s themselves, it can be reduced to a hypothesis which specifies the 6’s 
directly, except perhaps in degenerate cases which need not detain us. 


26.31. The theory developed in the earlier part of the chapter for composite 
hypotheses may be applied to linear hypotheses as we have defined them, and the argument 


LINEAR HYPOTHESES IN NORMAL VARIATION 293 


follows exactly that of Examples 26.8 and 26.9. It is readily verified that the three con- 
ditions of 26.29 are satisfied. We have— 


_ _ ae — Ue 
|} Ss ow | RE, 52) 


$; = constant 


nN 1 

Po = —— + % (ye — Mx)? 
4 aa a ; . (26.53) 
n 

Picea Pe 


We can therefore find similar regions w (¢, . . . ¢,, ¢,) and select from them the best 
critical regions in the usual manner. We will omit the rather cumbrous algebra and quote 
the following result TS ee 


Transform to new variates L, » Epsgs Yrts+i + + + Yn by the equation 
r+s 
Le pe Cy, By + 5 CeYjy + +» +  « (26.54) 
j=rt+st+1 : 
where the c’s are those given in (26.51) for j,k <r + sand the other c’s are orthogonal, i.e. 
k 
Dy =O; k#j, gorct+s ; _ (26.55) 
=l, k=, j>rts 
Define 
mr 
i SP I . ae) 
j=rtst+1 
: n a 2 : 
and nS? = ( Ci 7) . ° ° . . (26.57) 
k=1 \Nj=1 
A further transformation of Z,,, ... H,,, 1s now made to variables y,,, ... y,45; 80 
that (26.57) becomes 
r+s 
nS? = D3 Ry EE, + D' ye.  .  . — « (26.58) 
j,k=1 k=r+1 
r+s 
= nS? +. > ye. kw ww (26.59) 
k=r+1 
The coefficients R can, of course, be obtained from the c’s by ordinary determinantal 


algebra. 
Writing now ¢; = =o; — 0}, ie. the difference between 6, on the alternative hypothesis 
and its value if H, is true, we find that the best critical region is given by 


F Ps Ry, €; Ey 
J (nS? + nS3) IC Staae) 


j,k=1 


Sea e550) 


294 GENERAL THEORY OF SIGNIFICANCE-TESTS 


where v is distributed in the form 


n—s—3 


dF «(l1—v?) 2 dv — 1 aye oe : - (26.61) 


and v, is given by 


il 
= % =|| dF. ~  @ @  <e JseeeieaR 


26.32. There is one interesting conclusion to be drawn from (26.60). If a U.M.P. 
test exists, v should be independent of 6; and hence of ¢;, This appears to be possible 
only if the denominator in the second part of (26.60) is rational. But this denominator 
is seen from (26.59) to have the coefficients of a positive definite form and hence is only 
rational ifr = 1. We conclude that if r > 2 no U.M.P. test is possible for linear hypotheses 
in normal variation. 

We have already seen that under general conditions no U.M.P. test exists for r = 1, 
A similar conclusion follows from (26.60) if r = 1, for it then becomes 


Ry & E, 
a ery 
V (Ri) | €1 | i 


which, as usual, leads to two cases according as ¢, 2 0. 


. (26.63) 


26.33. We will pause at this point to review our results. We began by defining two 
kinds of error and showing that a test could be defined as “ best” for a single alternative 
hypothesis if it controlled the first kind and reduced the second to a minimum. When 
there is a class of admissible alternatives we may sometimes arrive at a U.M.P. test which 
will minimise errors of the second kind for any member of the class, and such a test may 
be regarded as the best attainable. Though the U.M.P. test does not exist in the great 
majority of cases, we may find tests which are U.M.P. for either 6, > 0, or 6, <6). Such 
tests have been reached for “ Student’s ”’ hypothesis and several others in common use, 
and are found to give the same tests as those introduced on rather intuitive grounds in 
Chapter 21. 


26.34. The absence of a U.M.P. test implies that in the majority of cases we have 
to look for other criteria to provide ** best’ tests. In the remainder of this chapter and 
in the next we shall consider several lines of approach which have been developed :— 

(a) Relying on 26.18 we may evolve tests based on the likelihood ratio. These will 
give U.M.P. tests if such exist, and in the contrary case will do their best, so to speak, by 
finding the greatest common denominator among the best critical regions. 

(6) We may consider the properties of tests when the sample number 7 tends to infinity, 
and so obtain tests which are U.M.P. in the limit. Such tests, like maximum likelihood 
estimators, may be employed on the grounds that they are “best” for large » and 
presumably good for small n. 

(c) We may derive a new criterion from the concept of bias in statistical tests, which 
will be explained in the next chapter. 

(d) Recognizing that there is no test which is U.M.P. everywhere, we may seek for 
one which is U.M.P. in the neighbourhood of the true value. The idea behind this approach 
is that it will be more important to detect errors in the neighbourhood of the true value, 


TESTS BASED ON LIKELIHOOD 295 


and that large errors may be left to look after themselves, either because they are infrequent 
or because almost any “reasonable” test will reveal them.* . 

(ec) When a number of independent parameters are involved, we may abandon the 
attempt to test for each separately and confine our attention to the class of hypotheses for 
which they are functionally related, e.g. by y =f (0, . . . 9,).. This reduces our problem 
to the case of a single parameter y, and we may be able to show that a particular », is the 
best in the sense that it is U.M.P. with respect to all other y’s, that is, to all other tests 
depending on the single function of the unknown parameters. 

We proceed to consider these approaches. 


Tests Based on Likelihood 
26.35. Suppose that for a given member of a composite hypothesis H, the joint 


sampling distribution of the variables x, . . . x, has a frequency function p, (which is, 
of course, the likelihood). Considering the 2’s as fixed, we may examine the variation of 
Po according to variation in the unspecified parameters 6, . . . 0, which form a set, say 


w. Let po (m max.) be the maximum value of p, for such variation. Similarly, if Q is 
the class of admissible alternatives H,, let p, (Q max.) be the maximum of the likelihood 
for variations of all the parameters 6, Ue Write 


A= 


Po ne max.) 


a. . (26.64) 


Then a possible criterion for accepting H, is to take as critical regions those points for which 
A <constant =O, say, . : ; ‘ . (26.65) 


where C is determined by relation to a probability level « from the sampling distribution 
of 4, which of course is independent of the unknown parameters. In defining 1 we have 
assumed that the maxima on the right of (26.64) exist, but we can give the equation greater 
generality by taking p, (w max.) as the upper bound of values of p, in the set «w where no 
maximum exists; and so for 2. 

In this form the criterion states that we are to accept H, if the maximum likelihood 
in the set of permissible H,’s is greater than a specified proportion of that in the set of 
alternatives H,. In doing so we control the first kind of error in the ordinary way. So 
far as concerns the second kind of error we saw in 26.18 that for H, simple the criterion 
provided a sort of highest common factor among available tests ; and presumably qualities 
of this kind will be equally useful when H, is composite. 


The Problem of k Samples 

26.36. We will illustrate the theory of the likelihood tests by discussing a problem 
of considerable practical importance. Suppose we have a sample from each of k normal 
populations, x;; being the jth member of the ith sample. Let 


n; be the number in the 7th sample ; 

N =Z(n,) be the total number of observations ; 
#, be the mean of. the ith sample ; 

8? be the variance of the ith sample. 

* An alternative line would be to concentrate on errors of the second kind for ideper deviations, 
on the ground that large errors are more important than small ones. I understand from Dr. B. L. Welch 
that he considered this approach shortly before the war ; the results did not differ very materially from 
those given by requiring optimum properties near the true value in the case he examined, and the 
results were not published. 


296 GENERAL THEORY OF SIGNIFICANCE-TESTS 
We will consider three different hypotheses H, :— 


(1) H, that all populations are the same and hence have the same unspecified mean and 


unspecified variance. 
(2) H,, that they have the same variance but different unspecified means py, .. . pz. 
(3) H., when it is known that they have the same variance, that they have the same means. 


We have for the joint co ae 
1 2 2 
p= > — ny ty — My)” + SF 
(27)2 i on exp | din! 20? 


Consider first of all H. We find, for p (2 max.), 
i= iy. es: . (26.66) 
& =O, .« . wy (26.67) 
and for p (@ max.), putting all the y’s and o’s equal and sama the first partials of log p> 
to zero, 


k 
1 
1; E; . ehhh (GS 
2 Ss 
i=1 
1 k 
oF = § =~ D' mi { (% — %)* + 3}. . we (26.69) 
i=1 
Inserting these values in p we find, after a little reduction, 
k 2\ ni 
Ne Se (3)3 wee (26.70) 
(a1 ee . 
Similarly it may be shown that 
4 IT feiss) es 
m= UG), + eee (86.71) 
1 
where | 8 =a Dm es 8. 2.  «  «  « (26.72) 
, 4=1 
and also that 
$ | a 
igi = Be 2% . . ° . ° e (26.73) 
0 


It will be noticed that Ay = Ay, Az, 
26.37. The function Ay, may be related to the correlation ratio 72. We have 


k 
1 Ss = = 
se = 32 a W nN; (x; = Zo)", . ° ° ° (26.74) 
i=1 


and hence 
N 
tan, = {1 — wae” a — aye be 


= — 9). er eS) 
The distribution of ,,, is thus obtainable directly from the known form for ny? in samples 
from an uncorrelated population. 


THE PROBLEM OF & SAMPLES 297 


We also find 
(Jv = AUT. (88.78 
(ag) a= {IT (s?)"* Wy. : : : ; . (26.77) 
0 
The distribution of (an) is that of 1 — 7?, where the distribution of 7? is 
dF x (n) 2 (= aya gre 26,78) 


It can accordingly be tested in this distribution or the related z-form. This is, in fact, 
the criterion used in the analysis of variance for homogeneity tests, and it is interesting to 
remark that the z-test here arises in considering the hypothesis that the various distributions 
parent to the sample values, being already known to have the same variance, have the 
same mean. The other form of hypothesis, H, is that the samples come from the same 
population, and the equality of variance is not part of the data but part of the hypothesis. 
We are not then surprised, or should not be so, to find that the 4, criterion leads to a 
different test. 


26.38. The moments of the distribution of 4, may be obtained as follows. The 
joint distribution of %; and s; is 


fl ce (1 CAMS = y 1m a+ Me, — no? | Il dé, 11 ds. . (26.79) 


The distribution of means is independent of that of variances and can be ignored. 
Further, if 


1 2 a 
r= 52 AN; (% — Xo)? 
then x? is also independent of the variances, and we have 


2 
dF o II (s,)"— 5 exp ( _ rat) y*-? exp (— 44?) Hdsidy. «. .« (26.80) 


2c? 
Put now 
1 n, 8? 
OS Se ———, . . . . . 26.81 
Ve, sf ( ) 


and note that 
o2 4? = Nsi — Jn, 83 
= Ns (1 — 2 y,). ¢ : é . (26,82) 


Transforming to variables » and 8, we find 


ni—3 k-3 Ns? 
aH oale oa ( — Sys) ¥ Udy, eX? exp ( = = | dst, 


whence, for the distribution of the y’s, 
n—3 k= 3 


dF cy, = (1 —Zy,) ? dy, »  « . (26.83) 
Now dn = IT (= i Se ee ee ees 


and hence we may find the moments of A, by integrating its powers over the distribution 


298 GENERAL THEORY OF SIGNIFICANCE-TESTS 


(26.83). Integrals of this kind, known as Dirichlet’s, are expressible in terms of gamma 
functions and we find, for the pth moment of A, about zero, 


DN Nye 
ver(2 ) ,r{e ey 


é 2 2 
Ha a) = Tp EN aa YB mga TN . (26.85) 
(Sa “aes 
When all the n’s are equal this reduces to 
eZ rye + we = “tls (7s 2) 
re . (26.86) 


Hp Axx) = k* eas jr{eros } 


26.39. For the criterion 4,, we start from the distribution 
dF o If 3%— 5 exp { — a & (n; 8%) \ IT ds; 


and on putting 


2: 
nN, 85 


a Ns; 
k=1 
va N 2 ie : 
n, 82 82 G ac ; ~ “SZ . (26.88) 
1 


we find, in much the same way as before, 


k-1 m—3 n—-3 
re (aan a Tie ae (i io se ee 
(¢ G1) 8 d C; 2 € Dt) 2 (26.89) 


N k=1 Hane a 
n= {a(t De) pF (7a) ks (26.90) 


Ro r{etdnatt 


@ f=1,2.5.k8—-1 . « 5) ee 


Further, 


whence we find 


2 


| a 


. (26.91) 


26.40. For large »,; we find, in virtue of the Stirling approximation to the gamma 
function, 


; ] 
I) si Soe, See 
( ) or An ee aie 1)k-1 
J 
(2) for An, i a eS TE 
(Goel) a 
; ] 
(3) for Ag, Hyp —> i 


a 
(pe lye 


THE PROBLEM OF & SAMPLES ; 299 
These limiting forms are the moments of the distributions— 


(1) (*e a)e~? 
(k—1) 


(3) 


(2) and (3) mi 
) 


Hence, by the transformation x = e~*” we see that approximately 4, is distributed as 
x? with » = 2k—2, and dy, and dg, as xy? with » = k—1. 


26.41. For small samples Neyman and Pearson have suggested approximating to 
the distributions of Ay’ and a by identifying their lower moments with those of the 
form 

Occ os (een yt 
This possibility has been examined in detail by Nayer (1936) for the hypothesis H, when 


all the n’s are equal. The distribution of A, has also been studied by Wilks and Thompson 
(1937a). 


26.42. Modified forms of the above tests have been considered by various authors. 
We may write 


2 F 
log Ay, =} 2'n, log = ; , ‘ : . (26.92) 
a 
where, of course, 


1 
8? = ay 2s 8 


In short, s? is a weighted mean of the s? and (Ag, is a weighted geometric mean. Bartlett 


(1937c) has proposed using the degrees of freedom »; (= ; — 1) instead of n, in these 
equations, that is to say, defines a criterion 


aL sg \e 
we = (sa) oe ew BBB 
teat J 


This test is, in the sense defined in the next chapter, unbiassed, whereas that based on 
2 log u 
C 


Az, is not. Bartlett also suggested as an approximation that — could be regarded 


as distributed as y? with & — 1 degrees of freedom, c being given by 


1 A a ag 
e=lt+ sg {2(,) a. Pe eneo92) 


This has recently been reconsidered by Hartley (1940), who showed that it is not very exact 
for large k and gave a better approximation which can be reduced to tabular form. Cf. 


Exercise 27.2. 


300 GENERAL THEORY OF SIGNIFICANCE-TESTS 


Likelihood Criteria for the Linear Hypothesis 
26.43. We now proceed to consider the application of the likelihood criterion to the 
. class of linear hypothesis as defined in 26.30. We have, for the likelihood function, 


1 \n 1 : 
r= (=755) exp | — 52 — my) \ : : . (26.95) 


Writing S? = X (x, — y,;)* we have, for the stationary values of p, with respect to o and 
the parameters @ (rel&ted to the w’s by (26.51) ), 


0 n , S? 

5g 108 Pe ——e a . . ° ] . (26.96) 
flog pee Sy zr 26.97 
26, O08 Po Le (%_ — My) Cy = 0. . . . (26.97) 


This last equation is clearly the one we should get if we were seeking to minimise S? itself 
for variations in the 6’s. Let nS? be this minimum value. We shall then have, from 


(26.96), 


o? = 82, : . (26.98) 
The maximum of p in the class 2 of admissible hypotheses is then 
1 ee: 
p (2 max.) (es) e 2 (26.99) 
Similarly the maximum of p in the class w for which 6, . . . 6, are fixed and the other 
s 6’s vary, is found to be 
( ) ( : "e2 26.100 
wm max.) = | —~—= — > (1 6 22>_Cl- ; ‘ : 

i Tere) gy 
where n (S? + S?) is the minimum of S? under the conditions that 6, ... 6, are fixed. 
Thus we find for the likelihood ratio A 

a il 
An = kw) : . (26.101) 

is 

Sa 

or, if more convenient, we may use the function 
S 
Z=— 
Sa 


to provide a criterion. 
Now we make the transformation (26.54) and show that the values 8, and S, as we 
have defined them here have, in fact, the values given by (26.56) and (26.59). We have, 


from (26.54), 
n r+s n 3 
= 2 — m= 3 Satis 3) can 


ial Mo j=rts+1 


=D) (Een B)* + > (Een y)? 
k=1 k=1 


-» (2 oy, B;)? + Ss Yj. 
= 


j=r+stl 


LIKELIHOOD CRITERIA FOR LINEAR HYPOTHESIS 301 


Since n S? is the minimum of S? for all variations of the 6’s and EH and y are independent 
of the 6’s, we must have 
nS? = Ly}. 
Also, since nS? is the minimum of S? when the values 6, . . . 0, are fixed, it is seen to have 
the value given in (26.59). 
We have also 
S?=—nSt+nS? . 2 ‘ : ‘ . (26.102) 


n r+s 2 
where ns, = , ( > Cir z,) ; 
k=1 \F=1 


and the frequency function of H’s and y’s is given by 
n : 
P(E... « Enses Yeseta «+ = Ya) © OXp { — 2 +sy}. . (26.103) 


Now nS? is the sum of squares of m — r — s normal variates, and hence 


Ore nS? 
7c ‘exp ( — oa) , A - (26.104) 
Hence, since the H’s are independent of the y’s, and since S? depends only on the y’s, 
f (Sq Hy. . + Byy,) o¢ S*-?-8-1 exp { — (83 + 83) \ . (26.105) 
We have seen, in effect, that n S? is the minimum value of 82. It depends on H,... &, 


and hence is independent of S? and is distributed as 


f (S;) oC St-! exp ( ad: ) 


2c? 


Thus we have 
J (Say 8) 2 Sgt-*-4 85-1 exp |— 5% (83 + 88) bs (6.108) 
Putting now Z = S,/S,, we find 
i) oc De Ze ss . (20M 
which may be reduced to Fisher’s form by putting 


Sp (nw — 7 — SiS 
z=+tlog— ie _ 2) = log Z + 4 log ——— ; . (26.108) 


We have thus reduced the test of the linear hypothesis to the z-test and it is seen that 
several of the tests introduced in Chapter 21 can be justified on the likelihood criterion. 
These include the ‘‘ Student ”’ test for one mean, the extended form for the difference of 
two means, and the test for the ratio of variances. Certain other tests in which the 
z-distribution (which, of course, reduces to the t-distribution for », = 1) appears—such as 
that of the correlation ratio, the multiple correlation coefficient and regression coefficients 
—also depend on the linear hypotheses, and in the light of the theory here presented are 
seen to be different aspects of the same thing, at least so far as the testing of hypotheses 
is concerned. 


26.44. We will indicate briefly, without going into the complicated mathematics 
involved, some interesting results obtained by P. C. Tang (1938) and P. L. Hsu (19410) con- 
cerning the power of the z-test as applied to linear hypotheses. 


302 GENERAL THEORY OF SIGNIFICANCE-TESTS 


The functions S? and Sj, as we have seen, are distributed independently in the 
y?-form, and their ratio accordingly in Fisher’s form. From this viewpoint the test of 
the linear hypothesis is a generalisation of the test of homogeneity in the analysis of 
variance. Tang considers the distribution of 


E? Sp 1 re 
= 2 + & = Se the . . . . (26.109) 
and the variation for errors of the second kind, namely, when the values 6, .. . 6, are 


different from the specified values. He shows that the power of the test depends, not on 
individual alternative values, but on a single function of the 6’s. He also obtains the 
power function and tabulates it. 

Hsu then considers other possible tests which are based on this single function and 
shows that in this class of test the z-test or the equivalent H?-test is the uniformly most 
powerful. 


26.45. For large samples, when maximum likelihood estimators of the parameters 
exist, the distribution of — 2loz/ is that of y? with s degrees of freedom. For the 
distribution may then be written (sce 17.46)— 


dF = Aexp = 5 = Gin (6, — 9;) (0, — ) \ doy. « abe 


so that p(Q max.) =A. -. 2 : : . (26.110) 
If 6, ... 6, are fixed the likelihood becomes 


n renee ° 
p = Aexp [- 3 ~ Sik %j Z_ — ual, 
where ye = > gine O, —6,) (G, —0,) «= “2 ~ Seceemnny 
7, k= 


and z; is given by 6, — 6; — L; where L, is a linear function of the r specified parameters. 
Thus— 


p(w max.) = A,e®%, . : ‘ ‘ ¢ (26112) 
where A, is the value of A when 6, takes its true value 0,,. Thus, when H, is true, 
Me : : : : . (26.113) 
But the characteristic function of 7? (= — 2 log 4) is 
| poet dd, ... db... 


2 ee ; e : 
=A | exp {7 52 Mint + ¥2 (at = 1} a6, OG oO d6,4¢ 
Ht 
ren | uk 
(1 — 2ct)2 
This is the characteristic function of a quantity distributed as 72 with s degrees of free- 
dom, and hence the result follows. 


26.46. In concluding this chapter we may mention briefly a question which fre- 
quently presents itself when statistical hypotheses are being tested in practice. Our tests 
are based on the observed values obtained in the sampling process, and in order to apply 


NOTES AND REFERENCES 303 


them we require no prior knowledge of the parameters to which they relate. They can 
be used in a state of complete ignorance about the parameters. But suppose some informa- 
tion is already available ; or suppose that we attach varying degrees of importance to the 


avoidance of particular types of error. How far are the tests developed in this chapter to 
be modified 2 


26.47. Consider, for example, the situation which has already been mentioned in 
connection with the theory of estimation, of the chemist who is assaying the strength of 
a particular drug. If the drug has harmful effects in large quantities it may be much more 
important for him to detect cases in which the true strength exceeds his hypothetical value 
than when the true strength is deficient. Again, the manufacturer of a “ guaranteed ” 
product is usually much more concerned with ensuring that it does not fall below the 
guaranteed standard than that it exceeds such standard. In such circumstances we may 
be particularly interested in ‘“ one-sided ” tests of the type & <&,, and as we have seen, 
there more often occur U.M.P. tests for this class of alternative than in the case when & 
can have any value. We might, therefore, be quite ready to accept such a test, knowing 
quite well that it may be insensitive in part of the range of the unknown parameter, merely 
because errors in that range are relatively unimportant. 

Similarly we might be willing to accept a test which had a poor discriminatory power 
in part of the range but compensating advantages elsewhere, simply because we know 
beforehand that values of the parameter rarely or never fall into that particular part of 
the range. This is equivalent to prior knowledge of the distribution of the values 
determining the alternative hypotheses. 


26.48. It is difficult to reduce rather vague prior knowledge of a parameter to numeri- 
cal form, and hence to extend our theory with great precision to cover these cases ; but in 
practice it is desirable to consider, before adopting a test, whether any prior knowledge is 
available, or whether our interests centre on particular parts of the range. If they do, we 
may consider the behaviour of power functions of the possible tests at our disposal and 
examine which is the more powerful test in the particular part of the range which interests 
us most. The mere fact that the theory developed in this and the succeeding chapter 
makes no assumptions about the prior probabilities of admissible alternatives does not 
mean that we should be acting sensibly in ignoring any prior information which may be 
at hand when applying the theory, or that we need feel compelled to apply tests with 
optimum properties in regions where we know the unknown parameter-values will not fall. 


NOTES AND REFERENCES 


The theory of this chapter is very largely due to Neyman and E. 8. Pearson, whose 
treatment has been closely followed. In their first contribution to the subject (1928) the 
likelihood criterion was developed, the theory of first and second kind of errors and power 
of tests being given in 1933. For the theory of unbiassed tests, see the papers of 1936 and 
1938. In the last few years the literature has grown considerably. 

Feller (1938) has shown that similar regions only exist in rather exceptional circum- 
stances and that the theory of composite hypotheses is incomplete. Tables of certain 
power functions and distributions associated with hkelihood tests are given by Mahalanobis 
(1933), Neyman and Tokarska (19360), Wilks and Thompson (1937a), P. C. Tang (1938), 


304 GENERAL THEORY OF SIGNIFICANCE-TESTS 


David (1939), Nayer (1936), and in Tables for Statisticians, Part II (Tables 35-37). See 
also Mahalanobis (1933). 

For tests based on the likelihood ratio, see Neyman ana een (1928, 193la, 19310), 
Pearson and Wilks (19336), Wilks (1935a), Nayer (1936), Welch (1936a), R. W. Jackson 
(1936), Sukhatme (1936b), Bartlett (1937c), Wilks and Thompson (1937a), Wilks (1938a), 
Bishop (1939), G. W. Brown (1939), Mood (1939), Hartley (1940), Wald and Brookner 
(19410). 

For the general theory, see also Welch (1935), Kolodzieczyk (1935), Neyman (1935), 
1937b, 1938), Daly (1940), Pitman (19395), Wald (1939a, 1941a), Wolfowitz (1942), E. S. 
Pearson (1941, 1942a), Dantzig (1940), P. L. Hsu (19416), Simaika (1941), MacStewart 
(1941), ScheffS (19420, 1943). 


EXERCISES 


26.1. Examine the following argument : To accept H when it is false is equivalent 
to rejecting not-H when not-H is true. Hence, if K = not-H, to commit an error of the 
second kind for H is to commit an error of the first kind for K; and thus there is 
no distinction between the first and second kinds of error. 


26.2. For the distribution 
dF = B e*@-” dz, z>y¥y 
= 0 e<y 
show that for a hypothesis H, that B = B)., y = yp» and an alternative H, that B = f,, 


y = 71, the best critical region is the region W, where p, = 0, together with the region 
W., defined by 


A 1 B 
£ —=— “lo k + lo ae 
aeons { MBs — YoBo g Be, 
provided that the admissible hypothesis is ene by the conditions y, < yo, B; > Bo. 
Hence show that a U.M.P. test exists in such circumstances. 


(Neyman and Pearson, 1936a. This shows that a U.M.P. test can exist for more than one unknown 
parameter.) 


26.3. If the distribution function of x, ... 2, is given by 


| n 2 R 5 
S| Oy %; — ny) —4 Di} ae siete MES, 


y,¢6> 0, =— © 4a... ae 
show that the frequency function may be put in the form 
n? (% — y)? , 
fia exp( — 7-55" ) exp(— 423); 
and hence that ¢ is a “ shared ” estimator sufficient for y and o. Show further that the 
best critical regions for yo, o differ according as o2 > 62, a2 <o? or o =a», and that 


their boundaries depend on y. Hence no U.M.P. test exists for admissible alternatives 
“o 0, 


IN 
8 


(Neyman and Pearson, 1936a.) 


EXERCISES 305 


26.4. In the previous exercise put o = y and consider the class of hypothesis y > 0. 
Show that there are different best critical regions according as py > yo, y < yo and that 
their boundaries depend on y. Hence there is no U.M.P. test, but < is sufficient for y. 

(Neyman and Pearson, 1936a.) 


26.5. In samples from a normal population, show that the probability of accepting 
the hypothesis that the mean yw <u, when, in fact, it is false and « = uw, > uo—that is, 
the probability of an error of the second kind—is 


n\ 1 ~ ar nv? 1 v—p 2, | 
© em FE) J, ad Ga oo ee 


a) Se al 


where p 
: Oo 


and ¢ is the value of ~ = ld corresponding to the significance level 1 — « for the control 


of errors of the first kind. 
(Neyman and Tokarska, 1936b.) 


26.6. In six samples of six members each the following values were obtained— 


Sample. Mean. 8}. 
1 8433 24,722 
2 8200 94,133 
3 7933 149,733 
f 8120 45,037 
5 7971 88,480 
6 8263 49,921 


with s2 = 104,588, Sa = 75,338. 


Show that dy, x = 0-8508 and ha = 0-6219. The 5-per-cent. levels are respectively 


0-67 and 0-54, so that there is no evidence of heterogeneity. 
(Pearson, appendix to papers by Wilsdon, 1934). 


26.7. Verify that the likelihood ratio leads to “ Student’s ” test for an unknown 
mean in normal samples, to the use of Fisher’s z in testing the equality of two variances, 
and to the ¢-test for the difference of two means in normal populations with the same 


variance. 


26.8. If samples n,... 7 are drawn from the populations 
dF = > exp(~"—*t) ar, faite 
i i 
use the likelihood ratio to test the hypothesis H, that the populations are identical, 
showing that 


k 
ihe a B (t, — wy)” mE lis , say, 
(% — ey)” ie 


A.S.—VOL. 1. 


306 GENERAL THEORY OF SIGNIFICANCE-TESTS 


where 2, is the mean of the ith sample, z;, is the smallest member of that sample, Z, is the 
mean of all samples together and x , is the smallest value in all samples together. 
Show that the distribution of x, and J; is 


1 ni = Nn; (I; + 2; ) 
en (pe Es SER 
x(q yt) ae rca 


and hence the moments of Ly are 


(rhe 1 4 2M 
_ Nerin—1) & r(m el 
TO a 


If H, is the hypothesis that the populations have the same o but any possible different 
B's, show that 
IT1,% 


IN’ 


EN Ape 


where 7 is the weighted mean of the l’s, and that 


N?I'(N —k) 
Mp (Ly) = ; 
rN —k ls 
( + p) Rea eet 
If H, is the hypothesis that the populations, being known to have identical o’s, have 
the same f, show that the distribution of 


Sls 


1 
Ly = Ag = 


iN a 


Cae 


Lo") (1 —L,)'-2 de 
(Sukhatme, 1936b). 
26.9. In the notation of 26.36 show that, if H is true, the criteria Ay, and Ay, are 


distributed independently. 
(Neyman and Pearson, 19310). 


CHAPTER 27 


GENERAL THEORY OF SIGNIFICANCE-TESTS—(2) 


Bias in Statistical Tests 

27.1. In considering the problem of estimation by confidence intervals in Chapter 19 
we had occasion to remark on the rather arbitrary nature of determining the interval so 
that both inequalities 6, <@ and 6 <6, had an equal chance 4« of fulfilment. A point 
of a similar nature arises in the testing of hypotheses, particularly when an asymmetrical 
sampling distribution for the criterion is concerned. Consider, for instance, the testing 
of the hypothesis that in a normal sample of n members the standard deviation o has an 
assigned value o, irrespective of the mean u. As we have seen in Example 26.3, there is 
no U.M.P. test for all o > 0, though there is one for o > o, and another for o <a). In 
choosing a test to cover the whole range o > 0 we have, therefore, a certain freedom of 
choice, since there exists no “best ’’ test as we have previously defined the term. A 
common test in practical use is to take the sample variance s? and accept the hypothesis 
o =, if and only if 


Cas <8). ; ; : A (27.1) 


where sj and s3 are determined from the distribution of s?, namely 
2 
dP oc s-3 exp ( — Se) 40") 2. ow Bers) 
such that 


$s. 


{ar -{. Fe y(h 15 en en i 


In short, s? and s? are chosen so as to cut off equal “ tail” areas of the distribution. This 
procedure will, of course, control errors of the first kind; but so equally well would the 
selection of s{} and s? so that 


Ct eg ie i akg TE) 


and dF =4—o, : : ‘ : » (27.5) 
$3 

provided that «, + a, =«. Thus we have an infinite number of regions which will control 

errors of the first kind. It is natural to seek for some criterion which will distinguish one 

as better than the others, recognizing that no U.M.P. test exists. 


27.2. Such a criterion arises naturally from the following consideration. In the 
example given, with «, = «, = 3a, let us calculate the power of the test for different values 
of «. This can readily be done from the distributions of type (27.2) by means of the incom- 
plete I-function or the equivalent y? integral. For any given o we have to find 


pt loy— | +| dF, ee (2726) 
0 83 
307 


308 GENERAL THEORY OF SIGNIFICANCE-TESTS 
where 


3) » 
re ME (aye (2) iy eT 


a Ga 
Ca 


Fig. 27.1, adapted from Neyman and Pearson (1936), shows the relation between 
the power function # and o? for «, = a, = 0-49, m = 3, the rejection level being 0-02. 


-10 
08 
-06 


‘OL, 


Power of Test. 


02 


O 
0 105 1:0: 15 20 
0” in Sampled Population (in units of 03). 
Fic. 27.1.~-Power Curve in Samples of 3 for c? from a Normal Population (see text), 


We see that for o> 1=o, the power increases, and so also for o <<} = 405. But 
between 40, and o, the power is less than 0-02, i.e. less than 1 — x. Hence for such values 
the chance of an error of the second kind, namely, the acceptance of a false hypothesis, 
would be greater than the chance of an error of the first kind, namely, the rejection of 
a true hypothesis. 


27.3. Whether this is felt to be anomalous depends on the relative importance of 
the two kinds of error in particular cases; but, other things being equal, it may be felt 
more important to avoid the second kind than the first, and not to have a greater probability 
of accepting the hypothesis when it is false than of rejecting it when it is true. This, at any 
rate, is the basis of the criterion which we proceed to discuss, namely, that the critical region 
w should be chosen so that P (# ¢ w) is a minimum when the hypothesis tested is true. 

Consider then the case when H, ascribes to a parameter @ the value 6,, and the admis- 
sible alternatives ascribe other values to 6 but do not differ from H, in other respects. We 
shall say that w is an unbiassed critical region if, and only if, 


| podu =P(Hew|6)=1—a%, . . .  . (278) 
and for any other 0, say 6’, 


[2 (6!) de =P (Bew| 6") >1—a, . ok. 9) 


UNBIASSED REGIONS OF TYPE A 309 


Equation (27.8) expresses the usual control of errors of the first kind and (27.9) the mini- 
mising property of w. If a region is not unbiassed it will be said to be biassed. 


27.4. In certain cases there will exist among the unbiassed regions a w, such that 


p (0') da > | p(')dx . : ; Z « (ote O) 


for all admissible 6’. Such a region may be called the best unbiassed critical region and 
the test based on it the uniformly most powerful unbiassed test, or briefly the U.M.P.U. 
test. It minimises the risk of errors of the second kind among the class of unbiassed tests. 
As we shail see presently, U.M.P.U. tests do in fact exist in certain cases. 

The use of the word “ unbiassed ” in this connection is rather arbitrary and is not to 
be interpreted as meaning that biassed tests will give systematically wrong results, or that 
unbiassed tests are based on unbiassed estimators. Fortunately the different uses of 
the term “bias” usually occur in different contexts and confusion is infrequent. 


Unbiassed Regions of Type A 


27.5. Following Neyman and Pearson, we now define an unbiassed critical region 
of Type A as one for which 


pdx =1—a, . ° . (27D) 
Le dx ==) () (27.12) 
36 WP ane : : P ee : 
and 7 |. p dx is @ maximum. . : eo (2iete) 
06? 6=6, 


We shall, as usual, assume that the differential coefficients exist and shall also assume that 
differentiation may be carried out under the integral sign, so that we have for all w, 


al Dae — = dx =|. p' dx, say, ; : » (27.14) 


and similarly for the second differential paren which we denote by p”’. 

The first condition (27.11) controls errors of the first kind; the second makes the 
region w locally unbiassed ; the third, (27.13), implies that as 6 departs from 6, the power 
function increases more rapidly than for any other unbiassed critical region of the same 
size. Thus in the neighbourhood of 6, the test may be said to be better than others of the 
unbiassed type. It may not be better for larger values of | 6 — 6, |, but the Type A tests 
are based on the supposition that it is more important to detect small errors of the second 
kind than to minimise the risk of large errors, which will probably be detected in any case. 


27.6. The regions of Type A may be found by the use of the following theorem : 
the region w, is an unbiassed critical region of Type A if, within wy, 
p (Oo) > kip’ (80) + ke p (8), . : . » (27.15) 
and outside wp, 
p” (Bo) <kip’ (90) + ke p (Oo), : 5 . (27.16) 


= »-[2].« 


and k,, k, are chosen so as to satisfy (27.12) aad (27.13). 


310 GENERAL THEORY OF SIGNIFICANCE-TESTS 


Suppose that F, ... /,, are functions of 2... x, and that 


| F, dx =c;, a constant. P ; : (25) 
Let w, be a region such that inside it 
Re S55 Woe : rs e718) 
and outside it a 
Fy <2k,F;, . : : : : . (27.19) 


where the k’s are constants chosen so as to satisfy (27.17). Then for any w for which 
(27.17) is valid 


| Pode < Fodz, . ne TE 
Ww Ws 


In fact, let ww, be the common part, if any, of wand wy. As both w and w satisfy (27.17), 
we have 


| F, ax = | Fe...) es (27.21) 
W—-WWy Wy WW) 


fae —{ F, dx 


W—-—WW 


oe F, de — | F, de = | 


Wo Wo—WWo 


> | E(k, F,) de — | 
> 9, 


E (ky F,) dx 


W-WWy 


in virtue of (27.21). 

In our present case take F, as p” (9,) and F,, F, as p’ (6,), p (9) respectively. Then 
(27.20) is true, and hence (27.13) is satisfied if (27.18) and (27.19) are true; and these will 
be found to reduce to conditions (27.15) and (27.16). The theorem follows. 


27.7. If (27.14) holds, and if there exists a sufficient estimator ¢ for 6, then the 
Type A region is bounded by surfaces of constant ¢. For then we have 
p (0) = pitt, 9) pa(z) . . . » (27.22) 
and hence, from (27.15), on substitution, 
Pi (t, Do) > ky py (t, 80) + ke pr (t, 90) 
within w, and conversely outside it. The equality must hold on the boundary, which 
is equivalent to the theorem. 


27.8. Writing 


0 
=|—l : : : : : : 
d E op | ; (27.23) 
, ae 
d’ = | as log p | . 5 : < . (27.24) 
Q 00, 


we have 


UNBIASSED REGIONS OF TYPE A 311 


and hence the inequality (27.15) reduces to 


¢+¢>kh¢+h, . ; : : « (27.25) 
within w,, wherever p (8,) does not vanish; and conversely outside wy. 
We may distinguish three special cases :— 
a) If ¢’ is a function of 4, say F (d), we have— 


F(¢)+¢3>hbth,. . . «  « (27.26) 
and the Type A region is bounded by the surfaces 
¢;=c, and g=1...m, ‘ : F 2(27.27) 


where m is the number of roots of (27.26). In this case, as we saw in 17.30, there exists 
a sufficient estimator. It follows that w, is defined by inequalities of the type 
C+, <¢ <Ca, 
and we may, as in 26.24, use the ¢’s as new co-ordinates and calculate the size of a region 
from their distribution functions. 
(6) As a simple case of (a), if 
¢ =A+Bd¢. é ; : : » (27.28) 
we find, for (27.26), 
¢* —k,d —ky = 0, ’ : : . (27,29) 
and the limits of ¢ are given by the two roots of this ern 
(c) If ¢’ cannot be expressed as a function of ¢ which does not involve the x’s explicitly, 
we shall have 
@? >k,+kho—¢% . : , . . (27.30) 
In this case, considering ¢ and ¢’ as two co-ordinates of a point in a plane, we see that 
the region for which (27.30) is true is the one ‘“‘ above” the parabola ¢’ = k, +-k, d — ¢%, 
and that k,, k, are determined by 


f_a[ 2 Goede ieee. 2. CO 
[eas] pee, g)dd’=0.  . «es (27.32) 


In this instance we can reduce the problem to two dimensions by using two new co-ordinates 


g, P. 


Example 27.1 
Consider the normal distribution 
dF = Sa OP f= 4 (@ — p)* } de, 
To apply the foregoing theory with complete rigour we have to show that (27.14) is true. 
We shall assume that this is so, referring the reader for a formal proof to Neyman and 
Pearson (1936). 
We have, then, with 0 = n, 
log p (x) = pe 45 (¢ —p)? 
p = 2 (x — fo); ¢ = —N, 
and hence this case reduces to that = (27.28). We write 


p = Nn (% — fo), 


312 GENERAL THEORY OF SIGNIFICANCE-TESTS 


and can clearly use @ instead of ¢ as a co-ordinate, which confirms the result of 27.7 since 


& is sufficient for yp. 
It follows that the unbiassed region of Type A is given by 


x < Gis & > Ly 
where | p (%) dt =a 
z, 
2, 5 
and ip p (&) (& — w) dz = 0. 


Now if H, is true, that is if « = wo, & is distributed in the form 


ip = /® exp {— Sew}. 


Hence #, = — #, and the Type A region is defined as being outside the range 
A = A 
Me V/n pent Vn 


where A is given by 


| ae re (| 
A 


In this case the Type A test leads to the usual test based on equal tail areas. The 
same test follows from the likelihood ratio, as the reader can verify for himself. 


Example 27.2 
If the distribution is normal with zero mean and variance o?, and H, is that o = ay, 
we find 
=F {Tze — of | =u (v —n), say. 
a in 2 Oo , 
This also satisfies (27.28), and the Type A region will be defined by 


Il 
1. Ol? ee 


9% 
where { p (v) dv =a 
and {2 (v) (v — n) dv = 0. 
Here  (v), the frequency futon. af the second moment, is 
pv) = =a yin—2) e-t dy, 


and we find, for the second equation, 
Va Vy 
| yim et" dy — n | yin—1 e-iv dy = 0. 
Vy Vy 


{Integrating the first member by parts, v being one part, we are left with 


|- 2vi” atl = 0 
VW 


or yin et — ui e7}*s, 


UNBIASSED REGIONS OF TYPE A 313 


This has to be solved in conjunction with 
i u hin-2) e-4n 
——_______ y\2~ 4) e280 dy = a. 
J, ran 7 


The numerical solution can be carried out by successive approximation or graphically. 
In this connection Fig. 27.2 is of interest. It shows, for samples of two and « = 0-98, 
the graphs of the power function for the ordinary test with equal tail areas, in addition to 
the power functions for the Type A test, the U.M.P. test with o > o) and the U.M.P. test 
with o < dp. 
Evidently, for ¢ > o, the best critical region (2) has the greatest power (as it must 
have), and for o < a, the best region (1) has the greatest power. The test based on equal 


12 - 
; / 
i 
/ 
‘10 ie 
B.C.R.() / / 
i / 
/ re 
08 : 7 ra 
; \ . ee 
+S \ f /EQUAL 
e \ ff Tals 
‘5-06 
fe 
i<¥) 
& 
(aE 
‘OL 
02 
6) 


0 05 1-0 15 20 
Oo” (in units of 02). =, 
Fic. 27.2..-Power Curves of Four Different Tests of the Variance in Normal Samples of 2 (see text). 


tail areas has a greater power than the Type A test for o > o, but a lower power for o < ay, 
besides being biassed, as we have seen. 

As n becomes larger the same effects persist, but the Type A and the “ equal tails ” 
tests become closer together in power. For samples of 20 or more there seems to be no 
serious loss in using the latter since the range of bias and its magnitude are then very small. 
If, of course, we knew in practice that o > o, we should use the U.M.P. test, and cases may 
arise, even when such knowledge is iacking, where “ one-sided ” hypotheses of this kind 
are ali that concern us. 


Invariance Theorem for Type A Regions 

27.9. It is important to show that the regions selected on the basis of Type A criteria 
conform to corresponding criteria if seme other function ¢ (@) is used instead of 6 itself. 
In Example 27.2, for instance, where we took 6 to be the standard deviation o, should we 


314 GENERAL THEORY OF SIGNIFICANCE-TESTS 


have obtained the same regions if we had taken 6 to be the variance o? ? The answer is 
affirmative under certain general conditions, as we should expect from the relationship 


with sufficient estimators. 
Suppose we have a new parameter ¢, given by 


6=6,+f(c)=y(c), - . . . . (27.33) 
where f(0)=0. Then if p (py) satisfies (27.14) and the similar equation in second differen- 


a : : 
tials, if y is monotonically increasing and E | > 0, then the region based on ¢ is an 
0 


unbiassed critical region if that based on @ isso. It is sufficient to show that (27.15) 
and (27.16) are satisfied for ¢. Now 


a? 
6=y(), y(0) =O | |, = y' #0, Ea = y” (soy), (20 
Thus 
pr (E | 80) = p; (Z| y (9) ) 
= Do (E | 90) y’, 
and ps (Z| p (0) ) = pp (E | 80) yp’? + 9% (E | 90) p”’. 


Solving these for p, and p, and substituting in (27.15) and (27.16), we find 
ps (Z| y (0)) > ky’ pe (E|y 0)) + he’ pp (Ely) - = -« ~~ (27.88) 
within w and the contrary outside, where 


73 i 
ki; =a" +, kin = hey yp’. . . (27.36) 


The result follows. 


Regions of Type A, 

27.10. The regions of Type A are determined so that tests based on them are 
U.M.P.U. in the neighbourhood of 6). We now consider a region, said to be of Type A,, 
which is U.M.P.U. everywhere, i.e. which obeys (27.11) and (27.12) but has, in place of 
(27.13), 


[ pdx > { pdx : : : : » C738 


w~ We, 


for every admissible 6 and every w satisfying the other two conditions. 
It is conceivable that (27.37) does not entail the existence of a U.M.P.U. test, for there 


might be an unbiassed region of size 1 — « for which the derivative of | p dx did not exist 


at 6 = 6, but which nevertheless gave a more powerful test. This refinement, however, 
need not detain us. 


27.11. If W. represents the sample-space where the density is not zero, if 


¢ =A + Bd, 
and if ¢ (9,) does not vanish identically in W, then the unbiassed critical region of Type A 
is necessarily of Type A. 
Let w, be the Type A region, which is determined ex hypothes: by two numbers c, 
and c,, such that— 


C1 <b <e,- outside wy. 


REGIONS OF TYPE A, 315 


We have to show that 


| pdx > | p dx 


for all admissible 9 and any w for which 
| pat —1— «a, : : : = (27.38) 
w 


with the consequence that 


[pide =o. . - ewww (27.89) 


Since ¢’ = A + Bd we have, solving this equation as a linear differential equation 
of the first degree, 


= {{Aexp(—[Ba)a0 +2} exp ( Bao. . . (27.40) 


The reader may verify that this is a solution, and since it contains the arbitrary constant 
T it is the most general solution. It follows that we may write 


log p = P (6) + TQ (6) +f (x), say, . : “ . (27.41) 
where P and @ do not depend upon x. We then have—primes denoting differentiation with 
respect to 6 and the suffix 0 relating to 6,— 


go = Py + TQ. . «os  . eae 
We note that Q, cannot be zero, for if it were we should have 


0=| topede =P, | pede =P, 


which would imply that ¢, was identically zero. 

In virtue of the lemma of 27.6, the proposition will be proved if we can show that 
for fixed 9 and 6, there are two numbers a and b, depending on 9 and @, but not on the 
x’s, such that 

p>Ppo(ad. +6) imsidew . ‘ : ~ (27,43) 
and the contrary outside w,. Putting the values of p and ¢, in this expression, we have 
to show that a and b can be found such that, inside wo, 


exp{ P (0) + TQ (0) +f (x) } > exp{ P (60) + TQ (60) +f (x)} {aP) + aTQ) + 6} 
or, writing r = P (0) — P (8), g = Q (8) — Q (64), such that 
exp (r + qT) >aQ,T + aP, +6 
>aT + b,, say. : 2 ° . (27,44) 
Here g cannot be zero, for if it were Q (0) would be equal to @ (@,) and, integrating the 
frequency functions over W, we should find r= 0. The alternative hypothesis would 
not then differ essentially from H,. 
Consider at the outset the case when c, and c, are different. From (27.42) we see 
that ¢, depends only on 7 so far as variation in 2 is concerned, and that 


ree. pa4— a= 7, (ay)  . .  « (27.45) 
c, — P, 
Q; 


Sa rE ‘ ° . (27.46) 


316 GENERAL THEORY OF SIGNIFICANCE-TESTS 


T, and 7, are different. Choose a, and 6, so as to satisfy 


PRU Li oe 


ay Jun a b, — ert ats e e (27.47) 


Then (27.44) is satisfied at the boundary points and we have merely to prove that 
C, <¢o <C_ implies et? <a,T +5, 
bo —C, and dy. > c, imply ef"? >a? + Oe 
This follows from the fact that 
ys ett _4,T—by 


. (27.48) 


has only one minimum, between 7’, and 7',, as may be seen by differentiating it twice, for 
the second derivative is positive and hence the first is a monotonically increasing function. 
But y vanishes at 7’, and 7’, and hence is negative between those values and positive 
outside them. 

Finally, if c, and c, are equal, say to c, we choose a, and b, so as to satisfy 


P, + Ont, =c 
gertat. _ gq, = 0}. : ; : . (27.49) 
otis 9.7, — 6; = 


It will be found that y has a minimum at 7’ = 7, and vanishes there. It follows that in 
the region a, complementary to w., where 6, =c, we have 


enter = ay it Sig b,, 


and thus in wy where ¢) <¢ or c < ¢, the left-hand side must be less than the right- 
hand side. The demonstration is complete. 


Example 27.3 


Consider again the data of Example 27.2. We have already seen that for this dis- 
tribution ¢’ = Ad + B, so that the regions of Type A are also of Type A,. Among 
unbiassed tests of the hypothesis this is the uniformly most powerful test. 


Composite Hypotheses: Regions of Type B 


27.12. We now consider the extension of the foregoing results to the case when 
H, is composite. For simplicity we will suppose that there are two parameters 6, and 6, 
H, specifying 0, as say 6,, and leaving 6, undetermined. Then a region wy will be said 
to be of Type B if 


(a) p (O10, 82) dw = 1 — « for all admissible 0,;  . : - F - (27.50) 


(b) | p (8;, 0.) dz may be differentiated twice with respect to 6, under the integral 
“sign ; 

(c) EA D (sy 92) da | =0.. ios. ne (27.51) 
264 Jao =, * 


(dq) For any other region w satisfying (27.50), 


a2 Q2 
Ear pdx | 2 ele dx] , (27.52) 


REGIONS OF TYPE C 317 


These conditions are obvious generalisations of those defining Type A. Putting now 


b; = 2 logp [aie . See, - £°(27158) 
06, 

bin = ee iis baa, 2, 3. ce = © TS 
0), 


we state that the Type B region will exist and may be found if ¢, and 4, are algebraically 
independent, if 


dip = By + B, $1+ Bz $3 
pag = Cy + CO, ¢2 


and if the law of distribution of ¢, is uniquely determined by its moments. We omit the 
proof of this theorem, for which see Neyman (1935b). 


Pir =A, +A, pr + A, | 
. (27.55) 


Simple Hypotheses with Two Parameters: Regions of Type C 


27.13. The extension of the foregoing theory to the case of a simple hypothesis 
specifying several parameters presents some new features. Again to simplify the discussion 
we shall consider two parameters, 0, and 4,. 

Consider the power function in the neighbourhood of 6, = 0, = 0 which we will suppose 
to be the values specified by H,. Writing for the function 


B (Or, |v) =| POs 6)de . . «  « (27.56) 


ap 
eRe 8. a ee er 257 
es a. B; j (27.57) 

a? B 
= Be lt Per es : . (27.58 


we have, assuming an expansion by Taylor’s theorem, 
B (1, 82 | w) = B (0, 0 | w) + 61 Br (w) + G2 Ba (w) 
air 4 {07 Bar (w) + 26, 92 Biz (Ww) + 63 Bas (wv) } = anos a eS . (27.59) 
To extend the idea of unbiassed tests to such a case we require in the first place 
oe 0 
or?\, Ee. ara 
Secondly, there will be a minimum at 6, = 0, = 0 if 
A = Big — Bu Baa <0 . ° 5 s(27.009 


and ta > Os as . ‘ ‘ ‘ . (27.62) 
If these conditions are satisfied the power function for small values of 0, and 6, is effectively 


B (91, 92 |}w) =1—a+ 4 {6 Bur + 2601 02 Bia + 63 Bas} 5 =) (27.05) 
We may represent this diagrammatically as in Fig. 27.3, which shows one of the ellipses 
for which the power function is constant. 

Since the hypothesis H, is that 6; = 0. = 0, we may speak of the value 9, as the “‘ error 
in 9,” and similarly for 6,; and if, as in the case depicted, the co-ordinate axes are not 
the same as the principal axes of the ellipse it is clear that for values of 0, which are not 
zero, errors of positive and negative sign in 6, are not equal. From this viewpoint it may 


318 GENERAL THEORY OF SIGNIFICANCE TESTS 


be said that the minimisation of the power function does not control positive or negative 
errors to the same extent ; for the points A and B in Fig. 27.3 lie on the ellipse of constant 


6, 


Fic. 27.8.—Ellipse of Constant Power for Simple Hypothesis with Two Parameters (see text). 


B, so that the probability of detecting them is the same, though A represents a positive 
“error” in 6, greater than the negative “error” given by B. 


27.14. Whether this is a desirable property of the test depends to some extent on 
what the test is intended to do. To avoid the anomaly we must require that 
bu =0 . ; : . : . (27.64) 
Furthermore, even if this condition is satisfied and the principal axes of the ellipse coincide 
with the co-ordinate axes, there may still appear anomalies if the length of one axis is greater 
than that of the other; for then errors in one parameter are not detected as frequently 
as errors of the same size in the other. Here again it is a matter of particular circumstance 
whether such an effect is regarded as objectionable. (We disregard the fact that it can 
be removed by appropriate scaling of the parameters, which may or may not be artificial.) 
To remove it we must require that 


Bir = Bas, : ‘ : : . (27.65) 


so that the ellipses reduce to circles. 
We may refer to the ellipses as “‘ curves of equidetectability.” 


27.15. With the foregoing explanation in mind we define w, as a regular unbiassed 
critical region of Type C if it obeys the conditions 

Bx (Wo) = Bs (wo) = 0 at ee : ; ; . (27.66) 

B12 (Wo) = 0 . : : : ; » 27.67) 


Bir (Wo) = Baz (Wo) : : oe - (27.68) 


REGIONS OF TYPE C 319 


and if, for any other region obeying these three conditions and for which 


p (0; 0'| 2) =p (0,02) —l—«, . : . . (27.69) 
we have 
Bus (Wo) > Bus (w). . . ° . e (27.70) 
Secondly, if a region w, possesses the property that 
Cale — 2. (0), ) —= 0 ; : : - (2a) 
Biz (Wi) — Bir (W1) Bor (Wr) < 0 : . . . (27.72) 
and for any other region obeying the conditions 
Bp (0.0 |e) = 2 (0, 0 (wy = 1 : ‘ (gies) 
Bur (Wi) = Baa (1) = Bos (W1) ; ; ? . (27.74) 
Bus (w) Bis (w) Bos (w) 
we have 


Bis (W1) > Bur (w) , . : e » (27.75) 
we shall say that w, is a non-regular unbiassed critical region of Type C. 

These equations are analytical ways of saying that the regular region of Type C is 
the one, among all regions having circular curves of equidetectability, which has the smallest 
radius for any given value of the power function ; whereas the non-regular region of Type C 
is the one, among all regions having similar ellipses of equidetectability, which has the 
smallest axes. 


27.16. We now state without proof theorems similar to those demonstrated above 
for the case of a single parameter. 
Write 


0*p 
a | ——S te. 
Ea lz hae a 


Then w, is a regular unbiassed critical region of Type C if 
(a) inside wy 
Pu > ky (Pir — Paz) + ke Piz + hs pr + ha Pr + ks D, « (27.76) 
and outside w, the inequality is reversed— 


(6) pide =) Pr da -| (P11 — Pos) dx = 0, j =], 2, (27.77) 
Wo Ko Wo 


Secondly, if w, satisfies the conditions— 
(a) that inside w, 

Pru > ky (re Pur — Vir Piz) + Ka (Y22 Pu — Yur Poe) + ks pi t+ hyp. + ks p (27.78) 
and outside w, the inequality is reversed, the k’s as usual being constants and the y’s obeying 
the conditions 

yu > 0, 9 Viz — Yur Y22 S05 


(b) DP; dt = | (Yie Pir — Y11 Piz) d& = | (Yeo Pir — V11 Poe) dx = 0, (27.79) 
WwW, Ww Wy, 


then w, is a non-regular unbiassed critical region of Type C, having ellipses of equidetecta- 


bility determined by 
Vu 6? a 2712 6, 0. + vob 62 = constant. . . . (27.80) 


320 GENERAL THEORY OF SIGNIFICANCE-TESTS 


27.17. The theorem of invariance of 27.9 no longer holds in general for the present 
case. If we transform to new parameters ¢, and ¢,, the equations of transformation 
de, = dO, + 5 dO, 
etc. will not transform an ellipse co-axial with the co-ordinate axes 6,, 6, into one co-axial 
with ¢,, ¢,. Thus, in general, the effect of a transformation is to make a regular Type C 
region into a non-regular Type C region. 


27.18. As usual, the conditions for the Type C region may be simply written in terms 
of the derivatives of log p. Write 


) 
See ig , eo, ; . (27.81 
; Ley sP| ( ) 


0? log p j 
ji : ° . ° e 27.82 
in = | Sek 00; 00, "a \\ ) 


Then if 

Pix = Aj, + By, dy aie Ci ps ° A es : . (27.83) 
we shall have 

Pix = (0; Ox + Aye + By b1 + Cin $2) P . . - (27,84) 
and the inequality (27.76) becomes 

(1 — ky) d? —kp bi do +hi 68 —hg bd. —hy d2—h SO. . (27.85) 

where the k’ are new constants easily expressible in terms of the old. They must be deter- 
mined so as to satisfy (27.77), which reduce to 


| d; p dx = (fp: do + Ax) p da = . { di — 62 + (Ay, — Aas)} p dz = 0. (27.86) 


Wo 


Example 27.4 


Suppose we have a sample of n, from a normal population with mean yw, and unit 
variance and a second sample of n, from a normal population with mean , and also unit 
variance. The simple hypothesis to be tested is uw, = u, = Mo, Where fo is some specified 
value. We consider two cases :— 

(i) in which errors of the same size in w, and uw, are equally important ; 

(ii) in which, for some reason, there is a stronger desire to avoid errors in uw, than 
in w, and that therefore a greater number n, of members has been taken in the second 
sample. We also assume that the sizes of errors judged of equal importance are 
inversely proportional to 4/n, so that we are led to consider new parameters— 


M = (ft — Mo) VM, No = (U2 — Mo) V/N2 Re Per 


CasE 1.—The frequency function is 


en Se 


m+1 
It will be found that 


Pr = N, (%, — fo) ; pz = Nz (%, — fn) 
dour SS hy = Aig, die eg Pre = Sho = Asa. 


REGIONS OF TYPE C 321 


From (27.85) we then find 
(1 — ky) nj (@ — po)? — ky my 0s (t, — fo) (E — Mo) + ky 23 (Fe. — fo)? 
=" k, Ny (ay ed Mo) a ky Nz (oe — Mo) aa ky > 0. * (27.88) 
The law of distribution of %, and , may be written 
p oc exp[— § {1 (1 — po)* + Ms (4s — fo)*} J. : - (27.89) 
Put u = Vn, (%, — fo) and v = V nq (% — po). 
Then the region w, is determined by 
(1 — ky) nm, ue — ky uv /(n, 02) + hy ng v® — by urn, — ky ovr, — k, >0 (27.90) 


where | p (u, v)dudv=1—a 


| wp (u, 0) du dv = | v p(w, 0) du dv = | uv p(u, v)dudv=0 . (27.91) 


Wo Wo 


| (n, u? — nz v?) p (u, v) du dv = (1 — a) (nm, — n,) . (27.92) 


and p (u, v) = 5 exp(—4( (u? + v?)}. 


It is evident from (27.90) that in the (w, v) plane the boundary of w, is a conic. From 
(27.91) we see that it must be coaxial with the co-ordinate axes and have its centre at the 
origin. Hence k, =k; =k, =0. Finally from (27.92) we find that the boundary is 
of the form 


2 2 
Sth ws (87.93) 
1 n(il—k lL ~a,k 
where | as 1 i 1) aes : ; : (27.94) 


The Type C regions are then defined by (27.93), but we have to express a and 6 in terms 
of known constants, including the probability level 1 — «. We have to satisfv (27.92), 
and will show that a solution always exists. 

Put 


eid, 0) = as (n, u® — n, v?) exp {— 4 (uw? + v?) } du dv — (m, — nz) (1 — «). (27.95) 
Tee 
If the boundary of w, is a circle, its radius is easily found to be 
a=b= v/ {— 2log (1 — «)}. 
The integral F (a, b) outside this circle, by the substitution wu =rcos y, v=rsin y, is 
found to be 


F (a, a) = (nm, — Nz) = wu? exp {— 4 (u? + v?) } du dv — (nm, — n,) (1 — «) 


on ae. 
= (1 — a) (m, — n,) $a?. 
Now taking w, as the space outside the parallel lines 
y=+A, 
e "dz = 1 —a, 


one 
which is given by a@ infinite, so that —~,— V/ (22) | a 


A.S.—VOL. II. Y 


322 GENERAL THEORY OF SIGNIFICANCE-TESTS 
F (co, 4) = — (m, —m,) (1 — a) + | ut exp {— 4 (u? + v4) }du dv 


aah v2exp {— } (u? + v2) } du dv 


Similarly, a 
FG, ©) =n, Jz Le 0. 


Thus, since F (a, b) is continuous it must vanish somewhere in the range 4 <a <®, 
4 <b <o. The values for which it does so define the Type C region. 


Casz 2.—In this case, using the parameters 7, and 7, of (27.87), we find 


Pr = 4U, Pa = 0 
éu = —1, giz = 0, Ga = — 1, 
The inequality becomes 


(l—k)ut—k,w +k,v? —keu—khv —k, >0, 
where | (uw? — v?) p (u, v) dudv = 0. 


In a similar way it follows that the Type C region is the one lying outside the circle 
u? + v2? = — 2 log (1 — a). 
We leave the verification of this result to the reader. 


Certain Limiting Properties 


27.19. From the foregoing examples it will be seen that in certain cases the optimum 
critical regions are by no means easy to determine numerically ; and it is not always clear 
that the labour involved is repaid by the results. Some consideration has been given by 
various writers to tests which have optimum properties for large n, the presumption being 
that the same tests will be good, if not the best, for small values. As usual when several 
limiting processes are involved simultaneously, the rigorous enunciation and proof of 
theorems in this field is a matter of some complexity, and we shall here merely indicate 
some of the results in very general terms without including proofs, 

It has been shown by Neyman (1938)) that there do exist tests which are unbiassed 
in the limit, and rules have been given for finding them. It has also been shown by Wald 
(1941a) that there exist tests which are most powerful in the limit, and that such as are 
based on maximum likelihood estimators are of this class. The tests are uniformly most 
powerful for the single parameter 0 > 6, and for 6 < 6, but not both ; and for any range 
they are the most powerful unbiassed tests in the limit. Furthermore, the Type A test 
tends to the most powerful unbiassed form. 

The general conclusion seems to be that, even where the variation is not normal, most 
of the tests in current use which are based on likelihood estimators have optimum properties 
in the limit, and may therefore be used confidently for moderate or large samples. For 
small samples the position is not so clear, particularly for non-normal variation. Tests 
based on inefficient estimators are presumably less satisfactory ; and for the non-para- 
metric case there is as yet no complete theory. On this latter question reference may be 
made to a useful review by Scheffé (1943). 


PITMAN’S METHOD FOR LOCATION AND SCALE PARAMETERS — 323 


The Unbiassed Character of Likelihood-ratio Tests 


27.20. It is of some interest to consider how far the tests based on likelihood (26.35) 
are unbiassed. 

It has been shown (Pitman, 1939); Brown, 1939) that the Neyman—Pearson test in 
the problem of k samples based on 4,,, is biassed unless all the samples are of the same size ; 
but that Bartlett’s modification (26.42) is unbiassed. We prove this in 27.25 below. 
On the other hand, Daly (1940) has shown that in certain multivariate tests such as those 
of regressions, multiple correlations, Hotelling’s 7’ (which we introduce in the next chapter), 
and the ordinary analysis of variance and covariance for orthogonal or non-orthogonal 
data, the likelihood-ratio tests are unbiassed, at least in the Type A sense (i.e. locally) 
and in some cases completely so. 


Pitman’s Method for Location and Scale Parameters 

27.21. In the special but not uncommon case where the hypotheses under test con- 
cern parameters of scale or location, a simplified approach is possible. Suppose the joint 
distribution of k sample-values is 

dF — es =r 7 Xe = hee os 8 6 Ly, oe 6,,) dx, oe 8 6 dx,. ° e (27.96) 
We seek for a statistic J, independent of the 6’s, to test the hypothesis ; and clearly, if the 
test is to be satisfactory, J must be independent of the origin, i.e. must be seminvariant. 
The test that the 6’s are all equal is then equivalent to testing the hypothesis 
6, =0, = Cereecl = 6, — (0. . . . ° (27.97) 
Without loss of generality we may suppose the hypothesis rejected if J is small and less 
than some quantity depending on the acceptance value «, and we may also suppose J 
positive ; for if either condition is not satisfied we can transfer to some other function of 
J for which it is. 

In the sample space W, J must be constant along the linew, = #7, = ... =a, = con- 
stant, and therefore the critical region w, will be the one lying outside a hypercylinder 
whose axis is parallel to this line. When H, is true, the probability of rejection is then 

af (¢,..,%)=1l—«, . é : . (27.98) 
Wo 


and when it is not true the probability is 
| TECH ee ee Se 


=| Fy. a). toe ekg (27.99) 


where w is merely derived from w, by a translation in W without rotation. If Z is any line 
parallel to x4, =... =2% = 0, we write 


P (L) =| a (ty... %) 
=|, flty...a)dn  .  «  « « (27.100) 
1 


where = pe ee ee |e OT AC) 
and 7 is thus the distance of the point (x, . . . x,) from the plane 2 (x) = 0. 


324 GENERAL THEORY OF SIGNIFICANCE-TESTS 


Now if w, is defined as the locus of all lines for which P (Z) > h, a constant, P (LZ) will 
be less than A on any ZL which is in w but not in w,. Hence 


{ ar> | ar, toe egg (27.102) 


and so the resulting test is unbiassed. Thus an unbiassed test is given by choosing J so 
that at any point of a line L it is equal to P (LZ) at that point. Now we may write for the 
variable co-ordinate on a particular L, say é,, 


1] 
where t= i (x) — JE 
Hence 
Pane ve{ f@r—~t%—th..-mpe—tep . . Qnaae 
; ae 
Taking a ais (ZL), 
we find 


t={ f(t. —t,%,—t,...%,—i)d, . »  « (27.104) 
which gives us an unbiassed test. 


Example 27.5 
Consider the case where the variables are distributed normally with unit variance. 


1 
f = — exp {— 4 (a; — 4;)? }. 
(2s0)2 
Then we have, from (27.104), 


I= ~{ exp {— 42 (x, —t)? } dt 
(22)2° ~? 
Gas 
ATs (Qo) ED 
where S = 2G a) 


In practice we should take S as our criterion, not J, and reject the hypothesis that 
the means were unequal if S exceeded some fixed value determined by «. We observe 
that in fact S is distributed as y? with k — 1 degrees of freedom when H, is true, so that 
this value is easily ascertained. 


27.22. Consider now the case where the frequency function is 


] vy Xp 
OPoae me 2 ci i). ee 205) 


If the ~’s are positive in range we put 
y; = log a, d; = log 9,;, P , A ~ (27.106) 
and for the frequency function of the y’s we find 
exp (Ly — JZ) f (e~*, ev, . . , ete 4), oo oe: (2TORp 


PITMAN’S METHOD FOR LOCATION AND SCALE PARAMETERS — 325 


This reduces to our first case, and we have an unbiassed criterion that 


oi =.= 20. =o 
by putting 


J -| exp (Ly — kt) f (e~#, e482. . eve) dt 


fies ae Ge Ls a \ at 
=(it % \\(% oe st) ar gs (27.108) 


When the x’s are not necessarily positive the expression remains the same, except that in 
(27.108) JZ (x) becomes JZ (| |). Small values of J are significant. 


27.23. Suppose now that our hypothesis asserts the equality of 6’s or ¢’s and 
states that they have a common value 6, or ¢o, as the case may be. Then if we take 


k 
J’ =-(1 x| ) f les Se ee : . (27.109) 


the test will be unbiassed. Moreover, if we regard small values of J’ as significant and the 
x’s are independent, and if each frequency function is unimodal, then when 


Ci 0... 0, 
is not true the probability that J’ exceeds the specified limit based on 1 — « increases as 
any § tends to 0,. J’ therefore provides an unbiassed test. 


27.24. Finally, consider the case of k variates each distributed in the form typified by 


aF = GP (— aig) We. +s QF.110) 


Their joint distribution is 


a, MEE ee ie RT eet) 


IT { $I’ (m)} 
Hence, to test the hypothesis that the samples have the same ¢ we have 
y= te) dt 
TT { i II {T (m)} 3 |e aevest 


where M = 2 (m), 
I (mM) IT Ge 


= - : : 6 (7 
IT {T (m)} (Sa)™ ( ) 
It is sometimes convenient to deal with 
eee (27.113) 
= x)? . . 
which differs from J only by a constant factor. 
The maximum value of K is 
IT (m™) 
Miu 
and we put e . 
log a x 
a ee ee MM |) — 4 | m log — : (27,11 
y log max. K 8 (Sp) ( sz | ( ) 


L is essentially not negative, and large values are significant. 


326 ‘GENERAL THEORY OF SIGNIFICANCE-TESTS 


For testing the hypothesis that a set of variances have some ‘specified equal value, we 
find similarly from (27.109) 


L' = 2 (2) —M —E(mloge ). ee 3) 


27.25. The foregoing result has an immediate application to the case of k normal 
samples, for the variances are then distributed in the Type III form of equation (27.110). 
The criterion LZ becomes 


2 
— N log (=) - = (log *), » wg (27,116) 


where » as usual represents the number of degrees of freedom and N = 2'(v). This, as 
will be seen by comparison with (26.93), is equivalent to Bartlett’s test, and shows that 
it is unbiassed. 


NOTES AND REFERENCES 


For the theory of unbiassed tests see particularly Neyman and Pearson (1936; 1938) 
and Neyman (19355). Regions of Type B have also been considered by Scheffé (1942a), 
who discusses a Type B, standing in relation to B as Type A, to Type A. 

For limiting properties see Neyman (19385) and Wald (1941a). 

See also references to the previous chapter. 


EXERCISES 


27.1. Show that the test of Example 27.1 provides regions which are of Type A, 
as well as of Type A, and that the test is a U.M.P.U. one. 


27.2. Show that the cumulants of the distribution of Z of (27.114) are 


= M {G, (M) — log M} — 2 [m {G, (m) — log m} ] 
aa ae ) — MG, (1) }, r= 1 
where G, = Flog I’ (m). 


Hence show that the cumulants of i are approximately x, = seat I(r), where 


L 
+ B 


= sea {7m) ~ m} 


20 2 en 
a is distributed approximately as y? with k — 1 degrees of freedom. 
(Bartlett, 1937¢ ; Pitman, 1939b.) 


and thus that i 


EXERCISES 327 


27.3. Show that in samples of 3 from a normal population the distribution of the 
range r is given by— 


di — 


re 5 
é eae [ a. e~t* dy dr. 
on 


0 V (2m) 


Hence that an unbiassed critical region of Type A is given by 


pa Ps 
E ei" Ke dy | = 
0 Ty 
Te 


us 
rent i V6 ¢-" dy = 1, ¢~*! | V6 9-W dy, 
0 0 


the region lying outside r, <r <1. 
(Neyman and Pearson, 1936.) 


CHAPTER 28 


MULTIVARIATE ANALYSIS 


28.1. We have already considered some aspects of the case in which each member 
of a population is characterised by several variates x, ...2,. For instance, we have 
examined the measurement of correlation between the variates and the regression of one 
variate on some or all of the others. In this chapter we shall extend our inquiries into 
the multivariate case a good deal further, mainly by taking into account the possibility 
that different sample-members may have emanated from different populations. This 
will lead to some generalisations of the methods already discussed for the univariate case, 
such as tests of homogeneity and tests of differences between two samples. Some of our 
known results generalise with nothing more than additional mathematical complexity ; 
but in others certain new features appear, and the theory of multivariate analysis is not 
entirely a matter of generalising univariate results to » dimensions. 


28.2. One or two examples will illustrate the kind of problem with which we are 
concerned. A number of skulls are discovered in a burial-ground. They are found to 
vary among themselves in the manner usual in biological material. Is the observed varia- 
tion consistent with the hypothesis that all the skulls were derived from members of the 
same race or does it suggest a mixture of racial types? If heterogeneity is indicated, do 
the skulls fall into two well-defined categories, such as we might expect if the burial-ground 
were the site of a battle between two races such as Saxon and Celt; or are there several 
types such as we should expect in the normal burial-ground of a town where races were 
living together and interbreeding ? Or again, if the skulls are compared with another set 
known to have been buried at a much earlier time from the same race, is there any evidence 
of a significant change in skulls from one period to the other ? 

There is no single measurement on a skull which is marked out from the infinite number 
of possible measurements for deciding questions of this kind. It is quite common for 
thirty or forty measurements to be taken by craniometricians on a single skull. Even if 
we reject many of these for practical reasons, leaving out the jawbone, for instance, because 
it is often separated from the skull and cannot be identified, we shall still be left with a 
number p which require consideration. For n skulls we shall then have n sets of p values 
corresponding to variates 2, ... 2%, which are, in general, correlated among themselves 
and may be highly so. Our problem is to test the homogeneity of these values. or to esti- 
mate differences between parent populations from which they were derived. We may, 
of course, apply methods which are already familiar by picking out one variate and testing 
for homogeneity. But we might pick out quite an unsuitable one and sacrifice most of the 
information. Even if time permits we cannot take each variate in turn and test it because 
the variates are correlated and our p tests are not independent. 


28.3. Again, suppose we have two different breeds of laying hen and are given a 
batch of eggs from the hen-run without knowing which hen laid which egg. We require 
to allocate the eggs to the two breeds. Assuming that there is no decisive criterion such 
as colour of shell, we may measure various properties of the eggs such as length, breadth, 

328 


THE SUMMATION CONVENTION 329 


weight, volume, specific gravity and so on. Some of these measurements will be highly 
correlated or, in the extreme case, perfectly correlated, as with weight, volume and specific 
gravity. In such circumstances we may reject some variates as redundant ; but in general 
we shall be left with several sets of measurements. Our problem is to find some method 
based on the retained variates for allocating the eggs to the correct parent breed. In 
particular we might search for the best linear function of the variates to discriminate between 
breeds and to enable us to assign the eggs with the maximum probability of correctness. 


28.4. Throughout the whole chapter we shall, except when the contrary is stated, 
assume that the variation is normal. In addition, to render our formulae a little less 
cumbrous we shall borrow a summation convention from the tensor calculus. If the 
affixes i, 7 range from 1 to p we shall write 


May = 3) SAM ay | (a ae ea 


t=1 j= 


the affixes to A being regarded as ordinary superscripts, not as powers. Similarly we 
shall have 


p 
At an = Ati Dike . . . . e (28.2) 


Whenever an affix occurs as a superscript and a subscript, summation is to be understood. 
Clearly the actual letter used is a a and we have, for instance, 
Ata = A‘ Ag; = A agai en . ° ° (28: 3) 
We shall write the array of es A4 (a square matrix) as (Ai ) and its davemmiceee 
as | A’ | or simply as | A |. 
To every matrix (a,;;) with a non-vanishing determinant there corresponds a reciprocal 
or inverse matrix which we may write (a). Since 
(a) (a”) = 1, 
we have, on carrying out the multiplication, 
a;; a* = li. j => k 
a0), Jk, 
which we may express as . 
0 1, 0 o,, : : P ‘ . (28.4) 
where 6", one form of the Kronecker delta, is zero if 7 # k and unity otherwise. The quan- 
tity a is the minor of a,, in | A | divided by | A | itself. 


28.5. It will further simplify our formulae and will give rise to no loss of generality 
if we suppose our variates to be in standard measure, that is to say, to have zero mean 
and unit variance. If we require results for the more general case we can easily obtain 
them from transformations of the type 

x, = 0,6 +m. . ° ; : : . (28.5) 
With this convention the equation of the multivariate normal distribution (cf. 15.12, 
vol. I, p. 376) may be written 


dF = ese = exp (— $A” x, 2;) day . + dtp, z es) 


330 MULTIVARIATE ANALYSIS 


where the A’s are related to the correlation determinant 
A=|py|. . | ee. : . (28.7) 
In fact (A”) is reciprocal to (p;;), as we saw in 15,12. 


28.6. Weshall also frequently refer to the matrix of sample variances and covariances 
which we shall call the dispersion matrix and write as (a;;), where 


ex 2 - 
i ae a; (x; — &;) (%; — %,). . : : . (28.8) 
ij=1 


This, it is to be remembered, is in standard measure for the population, that is to say the 
observed variates are taken from the parent means and divided by the parent standard 
deviations. 


Wishart’s Distribution 

28.7. We now proceed to generalise to p variates the joint distribution of dispersions 
arrived at in 14.12 (vol. I, p. 339) for the bivariate case; and we shall also show that 
the distribution is independent of that of means. The result and method of proof are 
due to Wishart (1928). 

First of all let us write the result for the bivariate case in our new notation. For 
the distribution of means we have 


+ 
dF ale | exp(— 549 x; i, di, dé, 9 — ee - (28.9) 
and for that of dispersions 
n-1 +(n—4) 7 
dF = @ a ( — 2 ae a) daty, day, days, (28.10) 
ar) aa) 
2 2 
For instance, we have 
Cin == si, Q12 =T7 8, Sa, Ao. = Ce 
1 ae 
(Ay 2( + ~ Oe 
be oi ae Ps 


so that (28.10) is equivalent to 


~ 


nr— 
ea (Lee aise 


ns var(™ 5 -) r(? - =) ee 


X exp { a is (s| — 2prs, 8, + st) ds, ds, dr. 


This, with the substitution 
r(® =} r(%=? _ Va I (n — 2) 
2 2 = Qn-8 


is the form found in equation (14.44), vol. I, p. 342, when it is remembered that we are 
working in standard measure. 


WISHART’S DISTRIBUTION 331 


28.8. Now consider the general case. With a sample of n values of p variates we 
consider p rectangular spaces of n dimensions each as the domain of variation. If a point 
in one of these spaces be fixed, the variation in the other spaces is constrained for fixed 
values of the sample dispersions. The following argument is a generalisation of that given 
in 14.12 leading to the bivariate result, and the reader may like to refresh his memory 


by re-reading that section. 
Writing x, . . . %» for the n values of the jth variate, we have for the density function 


of the whole sample, from (28.6), 


_ /4ig 7 
ene {4 Sa iy 2 wo} 


[4 [in Pe ae. ne 
peer eS Be tA "iy — X;) (Xj, — %;) }] X exp — 5 AY a; %; . (28.11) 


We may thus factorise the density function into two parts, 
nip | A ft Gate 
fi = aorta exp ( a 5 At x; i) . ° e ° (28.12) 


d sy Ia Au 28.13 
a fa = Sm oqymci OP — 5g A¥ ay), + (28.13) 


where we have chosen the constant factor of f, so that the distribution shall have the total 
frequency unity. 


ic) 
Consider now the volume element [J dx,,dx,,, ... dx. Inany particular n-space 
ke 


the density is constant over hyperspheres centred at the mean. The volume element may 
then be represented as the product of elements dz; and of independent elements depending 
on dispersions. In the total space of pn dimensions the volume element may thus be 
represented as the product of p elements dz; and an independent element depending on 
dispersions. Thus the volume element also factorises, and we have immediately for the 
distribution of means 
3p t p 
i = i exp ( — 2 Ava, i,) Hid .  . — . (28.14) 

showing that the means are distributed in the multivariate normal form independently 
of dispersions. 

If we define a matrix (B) with elements 4n times those of (A), we may write the dis- 
tribution of means in the simple form 


ar = — 2 ily exp (— BY %,%,) dz. . : : . (28.15) 


We note that this checks with the —— results for p = 1 and p = 2. It is also seen 
almost at once that the variance of @, is o}/n, as we expect. 


28.9. We have now to consider the more complicated expression for the volume 
element of dispersions. Let us in the first instance transfer our origins to the sample means, 
remembering that in doing so we have lost one dimension (or degree of freedom) in the 
variation of our sample-points. Let P, . . . P, be the sample-points whose co-ordinates 
are the n values of x, . . . 2, one point P lying in each n-space. We shall consider in 
turn the variation of P,, then that of P, for fixed P,, then that of P; for fixed P, and P,, 


332 MULTIVARIATE ANALYSIS 


and so on. The total variation will be given by multiplying the various expressions so 
obtained ; and it will be sufficient if we consider the typical casé of the variation of P,, 
fortm = Ieixed points Py soae?,,_1. 

For a fixed length OP,, and fixed angles with OP, ... OP,,_,, P,, can vary on a 
hypersphere of n — m dimensions ; for, if we fix any particular angle, P,, is constrained 
to lie on a hypercone which cuts its hypersphere of variation in a hypersphere of one fewer 
dimensions, and the fixation of the origin at the sample mean imposes a further constraint. 
Further, if we regard the p spaces as superposed, as we may, the centre of this (n — m)- 
dimensional hypersphere is the foot of the perpendicular from P,, on to the space containing 
the points, O, P, ... P,,. Call the length of this perpendicular for the time being 7,,. 

The volume of a k-dimensional hypersphere of radius r is 


atk pk 
pea 
P| a) 
and its surface area, obtained by differentiating with respect to r, is 
2 atk yk-1 
I (kk) * 
The surface area of the hypersphere of variation of P,, is thus 


~  » (28st 


Dh (n—m) pe 
r(ta™ 
("3") 
To find the element of volume due to the variation of P,, and the angles which OP,, 
makes with OP, ...OP,,_,; we have to multiply (28.17) by an element of variation 
normal to the hypersphere of » — m dimensions. This variation lies in the hyperplane 
determined by the origin and P, . . . P,, which is, in fact, normal to the hypersphere. 
To evaluate it, consider the transformation 
Crag = yy Xmnk Lik» j = il mom a m, ° e ° (28.18) 
k=1 
where, of course, the x’s are measured from the sample means in virtue of our choice of 
origin. We have for the Jacobian— 


. «eee 


oe 0 (Emi bhedien ae pan) 
Q (@m1 seus Lm) 
Lana Lie tis. stents 
= L192 Loo oe 6 Lom 
2X10 Dl i oe 
= 20,,, : : : ; ° ° : - (28.19) 
where v,, is the volume (or “ content”) of the hyperparallelopiped having one corner at 
the origin and edges running to the points P, ... P,,. Furthermore, 
L bmg |= | ¥ tne Xe | 
ia | nae le 


=i. - «ee se eiee 


WISHART’S DISTRIBUTION i 333 


The required element is thus 


1 
mass dé 
2u,, I Smks 


and the total element of variation of P,,,, on multiplication by (28.17), is 


m? 


k(n—m) yr—m-1 m 
art! ) Te 


dem : : : . (28.21) 
n—m k=1 
r( 9 ) Um 
Now 7,, is the length of the perpendicular from P,, on to the space OP,... P,,_, 
and is therefore equal to v,,/v,,_;. Hence, for the variation of P,, we have the element 
gh” m) yn-m 2 =m 


Hide. sn 5) (Bee) 


We now derive the total element for variation of P, ... P,, by multiplying expressions 
of type (28.22) form = 1, 2,.... The terms in v cancel except v, and v, the latter 
being unity, and we find 


gytP (2n~-2-1) m p 
om-P-2 TT TT dy. . wk (28.23) 
ir r(*>*) © j=1. k=m 
k=1 2 
Now from (28.18) we have 
En =n Aix e e . . ° . (28.24) 
and from (28.20) ed ||: : : ; . (28.25) 


Making the necessary substitutions in (28.23) and adjoining the frequency element given 
by (28.13) we find, after a little reduction, 


Gs) 


apt? (p—1) a r(*>) 


exp ( _ 5 Ai a) Ida.  . (28.26) 


This is Wishart’s generalisation of the distribution of dispersions in a multivariate 
normal system. ‘The reader who feels that the foregoing proof demands too much of his 
powers of geometrical insight may refer to alternative derivations by Wishart and Bartlett 
(1933c) or P. L. Hsu (1939a). The domain of variation of the a’s is 0 to oo for a,, and 
corresponding values for a;;, 1 #7, such that correlations do not exceed unity in absolute 
value. 


ij? 


28.10. It must be remembered that we are regarding a,; as the same as a,, and that 
the product of differential elements in (28.26) contains 3p (p + 1) items, not p?; for there 
are p elements of the form da,, and $p (p — 1) of the form da,;,1 #j. The expanded form 
of A” a;;, however, takes place over 2, 7 from 1 to p, so that any particular term such as 
A34 as, occurs twice, once as A**+a,, and once as A*?a,,; except that when 7 = j the term 
occurs once. For instance, with p = 2 we have 


Ata —— = All Ari = 2A12 Ay. a A Qo. ° . ° (28.27) 
We can now derive the Beveaxisite function of the Wishart distribution. Ignoring 


334 MULTIVARIATE ANALYSIS 


constant factors and writing a single integral sign for summation over all a,,, we have, 
from (28.26)— 


oa Ae 3 n K 
{_. aaa" = exp io 5 At ay) IT da = [Aen : . (28.28) 


where K is some constant. In this form let us replace AY by A” — 2 6? when 7 ~j and 


by AY — aoe when i =j. Then the resulting integral is the characteristic function of 


the a’s, 0% being the parameter it’? corresponding to a,. We thus have 


(8) = yp 
At 2 Glue 1 Ale Alp ee lop 
n n —n 
Aus" pi2 Amt ga, Ameo |, , (28,29) 
Alp — gp Aw —lop... Avp — 2 ppp 
nN n nN 


the constant being evaluated by the consideration that ¢ (0) = 1. 


Example 28.1 


Let us apply these results to an examination of the moments of the distribution of 
covariance in the bivariate case. We have 


Ald = Asi 1 Ail. aie 
1 — p? 1 — p? 
We then find for the c. f. of @4;, G13, @22— 
bo ae See UE ae) 
1 — p* n Il—p* on 
—p 9 l 2p22 
| 1 — p? n l1—p? n> 


We are interested only in the parameter 61? which we will write as 0, putting the others 
equal to zero. We then find— 


bc | eta) eee _§ 7 eae 
(1 =p")? 1—p? 
pee 2p (1 — p*) 62) ~Hm—) 
|e 


m2 


Taking logarithms and evaluating coefficients of powers of 6, we find for the cumulants 


- Be | 
ie a p 
n—l 
Lay as (sree 
2(n—1 
_———— al (3 + p?) 


6in — 
ne ue] tp). 


HOTELLING’S DISTRIBUTION 335 


In standard measure the distribution tends to normality as m tends to infinity. But for 
finite n we have 
5, — 4 er (3+p%} 
n—1 (1 + p?)3 
6 14 6p? + pf 
a aa mT 
Thus, even when p = 0 our distribution, though symmetrical, is not normal. 
Wishart (1928) has given formulae as far as those of the fourth order for eight or 
fewer variates. 


Hotelling’s Distribution 
28.11. In the univariate case we can test the significance of a mean by comparing 
it with the estimated standard deviation, the ratio being distributed in “ Student’s ” form 
(or some simple transformation of it if we compare the mean with the actual sample variance 
and not the unbiassed estimator). We proceed to generalise this result. 
We require a single quantity which will serve as a measure of departure of all the means 
“; from the population values which, as usual, we take to be zero. In place of the matrix 
of dispersions, we shall consider the matrix of sums of squares and products (b;;) where 
n 
by = dD, (it Bice = ele « i. «. ames) 
k=1 
As usual we take (b%) to be the matrix inverse to (b,;). Let us now write 
T?=n(n—1)b4 &,. . : : : . (28.31) 
This is Hotelling’s generalisation of the ‘‘ Student ”’ ratio t. 
In the simplest case when p = 1 we have 


Ox = ns? 
1 
On gt 
and hence 
Pe = | eee oso) 


so that 7’ becomes equal to the ratio ¢ as required. 


28.12. We have 


2 
Pr = 1 = 7 bY x; Lj. e ~ e ® e (28.33) 


Let us now denote by m,; the sum of squares or products about the origin, so that 


Maz = Dy + NE H;. . . , . . (28.34) 
The determinant of m,,; may be written 
I E,>\/n Lo+/n VAL 
0 by, + ney Die + NG, .. . diy + NEz, 


m= 4 - -_ 
0 bie + NXH; bee + Hq. « s Onn + NHK, 


0 Dip + Nhgfr Day + Nighy » . « Opp + 082 


336 MULTIVARIATE ANALYSIS 


On subtracting #,1/n times the first row from the second, and so on, we find— 


[1 | =| 1 DEN Te os 5 a A/ 0 
— E,\/n be . — «2 Bip 


—,/n by ... Oy | 
and on expanding according to the border row and column, 
[my | =| by | + nb4 ae, [by[- . . . © (28:36) 
It follows that 
[2 
| Big | -— = | tag | — | ey | 
= tie ao i ; + em 
122 if 
n—I 


This is a fundamental equation in the sampling theory of 7’ and we proceed to interpret 
it geometrically. 


28.13. Inthe case p = 1 we have a single sample space of n dimensions. The numera- 
tor and denominator of (28.36) then reduce to 6,, and m,,—that is to say, the squares of 
distances from the sample-point P, to its projection on the unit vector whose direction 
cosines are all equal, and from P, to the origin, respectively. The ratio of (28.36) has 
zero dimensions and is in fact the square of the sine of the angle between OP, and the unit 
vector. This is the geometrical approach which gave us “Student’s”’ distribution in 
Example 10.6 (vol. I, p. 239). 

In the general case let us regard the p n-spaces as superposed in one n-space. The 
points P, . . . P, will lie in a space of p — 1 dimensions, a hyperplane in the n-space. 
Now we may rotate the axis without altering the functions | m;;| or | 6;;| which are easily 
seen to be invariant under orthogonal variate-transformations. If we perform such a 
rotation so as to bring the (p — 1)-space of sample-points into correspondence with p — 1 
co-ordinate dimensions, we see from (28.20) that | m,;| is the square of the content of a 


hyperparallelopiped with one corner at the origin and sides parallel to OP; . . . OP,. 
Now consider a hyperplane perpendicular to the unit vector meeting it, say, in O’, 
and let P; . . . P, be the projections of the points P on to this hyperplane. Then 3,, 


is the covariance of the co-ordinates P; and P; referred to 0’, and hence | 6,; | is the square 
of the content of the hyperparallelopiped in the hyperplane. Furthermore, the content 
of this figure bears to that given by | m,;|a ratio equal to the cosine of the angle between 
the unit vector and the hyperplane. Representing this angle by 6, we have 


1 
ae = cos? §. . : Z - 
eae 


. (28.37) 


n—1 


28.14. Now if the sample-points P are distributed in the n-space with random 
orientation, the hyperplane which they determine will be distributed randomly in regard 
to the angle which it makes with a fixed vector, and in particular with the unit vector. 
The sampling distribution of 6 is then that of an angle between a fixed vectur and a random 
plane. But this, from a slightly different viewpoint, is precisely the problem of distribution 
which we solved in connection with the multiple correlation coefficient R, for we saw (15.18, 


HOTELLING’S DISTRIBUTION 337 


vol. I, p. 381) that F is the sine of the angle between a residual vector represented by a 
variate x, o,,,, and the space containing other variates 7, . . . x); and in the case when 
the former is independent of the latter we can regard it as fixed. Thus, from (28.37) we 


may write— 


=1-R% . . . 4... (28,38) 


The distribution of R? in the case when the variate concerned is independent of the 
others is 


dF — EE — R2)in-p-2) (R2)H@-3) qR2, , « (25.00) 


n—p p—1 
a(n) 


where we must remember that p is the total number of variates and the variates are measured 
from their means in forming the regression equation. Before substituting (28.38) in this 
expression we must increase p by unity, since in effect we are considering p + 1 variates 
—the unit vector determining an additional one; and we must also increase n by unity 
because our variation is not restricted to that about the mean, as for multiple correlation. 
With these alterations in (28.39), we have, on substituting for R from (28.38) and a little 


reduction, 
a 3(p—2) 2 
Pie : aes) a( Z }. . (28.40) 


n—p p 2 \in n—1 
i) es) 


This is the distribution of Hotelling’s generalisation of “‘ Student’s ”’ ratio. 


————— 


28.15. At the end of the chapter we shall see that this is a particular case of a more 
general distribution (28.31). A third and instructive derivation, due to Wilks, is as 
follows :— 

From the manner of derivation of Wishart’s distribution it will be clear that if we 
substitute the moments about the origin a,, for those about the mean a;,, the distribution 
is the same, except that there is an extra degree of freedom. The distribution is then 


(= | A Ly" | a’ |t(n—p—1) 
2p . 
dF = 


tpt) TT r(*t—) 


exp ( _ 5 4 vi) IT da’. 


Putting BY = 5 A‘j, we find, on integration, 
mip(p-1) mr(™ ais ; = *) 
[ B | 


Now replace n by n + 2r in this expression and divide by the term on the right in (28.41). 
The result is to give us the rth moment of | a’ | as 


| | a’ |#™-P-)) exp (— BY aij) IT da’ = (28.41) 


1 P 2 


' _ 2 eae : . (28.42 
pela’ |) =e IE rH) (28.42) 


2 
iS —VOL,. i. 4 


338 MULTIVARIATE ANALYSIS 


We may also write the distribution of a;,; in the form given by our original derivation of 
Wishart’s distribution :— 
| B | 4(n—1) | a | (n—p—2) 


app) ue 7 "| 


Multiply this by | a’ |", integrate, and use (28.42), transferring constant terms to the right 
as in (28.41); then replace n by n + 2s and divide by the constant terms as they were 
before substitution. We find— 


; , (ttt " tree) r(*e +e) eal 
ae ry, = eT: . (28.48 
Mr, (|a’ | ||) | B rts es r(tti—*+e)r(*5*) 


' t 
exp (— B%a,;) IT da x ee exp (— BY &, z,) IT dz. 


7P 


to 


2 
Now put r = —s and note that 
ja |_|] 
Ja’ | | m| 
We find 
n n — 
ae 7G) Gee 
ion iy ae n—p 
re one Tf 5 
n—?p P 
B 5 +93) 
= ee eee ee 
n—Pp p 
a(R" 5) 
Now the function on the right is the sth moment of 
dF = —__1 ___ gitr-v-2) (1 — 8-9) de. s, (28.45) 
("525 
pe 


which is uniquely determined by its moments. This, then, is the distribution of the ratio 


b ‘ : 
eel Bs : and on substitution in terms of 7 from (28.36) brings us back to the distribution of 


(28.40). Incidentally this method gives us one more derivation of the distribution of 
multiple correlations and correlation ratios when the respective variates are independent. 


Significance of a Set of Means 


28.16. Suppose that we have a set of k samples with numbers n, ... 7,, each 
from a p-variate population. Let us also suppose that the populations have the same 
dispersion matrix but different means, that of the jth variate in the /th sample being y, .. 
We proceed to derive a criterion for testing the means simultaneously. Our result is a 
generalisation of the testing of k means in normal samples, and we shall obtain it by applying 
the same method, namely by using the likelihood criterion 


p(w max.) 
p, (2 max.) 
as given in equation (26.64). Here w is the domain for which all the means of the jth 


A= 


SIGNIFICANCE OF A SET OF MEANS 339 


variate have a common value yu; and 2 that for which they have the more general values 


Os 
Let 6;; be the function 6,; for the ith sample (J = 1, 2,... k) and Z;. the mean 
of the ith variate in that sample. Put 
k 
by = Dd bs a er ere 08) 
l=1 


where, of course, 
ny 
by = » (it — Fim) (em —% om) +» ° - (28.47) 
t=1 


Put, for the functions of the pooled samples, 


1 % 
— 2, %;) - : : : . (28.48) 
t1 nr 1 


1 
n 


If then 

mys = zi (Xie a) — Hew) (Sew — yw) - + — « (28.50) 
the likelihood of all samples together is 

c|A |” exp {— 42 (n, AY m,, ) }, . F ; ~ (28.51) 

1 
where ¢ is a constant. 
Taking logarithms and differentiating, we have for the maximum value equations 
typified by 
a mn m AY { (tu — Mew) + Gem —4@) } =, 


which reduce to 
x; (I) = bi; (l)° ° . . ry ry . (28.52) 


The maximum likelihood values of the m’s are then given by 
Miz y = bis 
Furthermore, the values of A¥ are then given by the inverse of the matrix (| 5, ), and the 


exponent of (28.51) becomes 


= in Dy (Au b;; w) SS énk, ‘ . . ° (28.53) 
We then find 
coe 
Pp, (@ max.) = >~——,. © 6 6 —« (28.54) 
ns 
In a similar way it will be found that 
cenit 
Po(w max.) = —~—_,, we .)Cw (28568) 
Dia 


340 MULTIVARIATE ANALYSIS 


Hence 
_ (|i 
a 
palm" 
1 be? 
=D 
Ba 
and we may write 
1, | 
2m | _ | by 
L=me= — Ls . e . e e 28.56 
E [Ba | ee 
n 9 


and take DZ as our criterion. 


28.17. The distribution of LZ for general & is not easily expressible, but we may 
determine its moments by the method employed in 28.15. The functions = b;; are dis- 


tributed in Wishart’s form and their moments accordingly given by equations of the type 
(28.42) with n replaced by n — 1, namely, 


By (| by |) = | 3 r i a ae - . » (28.57) 


had 


Now each 0;; ~ is distributed in Wishart’s form, and therefore their sum is so distributed 
(cf. Exercise 28.3). In the manner of 28.15—we omit the details—it is found that 


n—™m n—-m+1—k 
(ia) CECE) 
= aE . (28.58) 


| b; | mup(2o™ yr) r(2=m tia") 


where we now use m as an index of summation, reserving k for the number of samples. 
This gives us the moments of L. 
In the case k = 2 we have 


a) ee 


Mr n—l n—p— 
p=} 
r( 2 +r\r(2—3—*) 


and hence the distribution of Z is in the form 


1 
: : - i(n—p—3) — )t(p—2) 
dF (2a) (1 — L)H?-® dL. . + (28.60) 
: 2 oD 


In the case k = 3 we find 
—l1 n — 2 n—p—1 —p—2 
Van es ieee n—p 
. (*)? (“ee ee 
—1 nfn—2 n—p—l1 == Y 
Pl Pp iP 
Geer ea) 


DISCRIMINATORY ANALYSIS 341 
which, in virtue of the relation 


ep ieee ties er 
becomes 
I (n — 2) (n —p — 2 + 2r) 


a emia =e 3) : & GS-6) 
These are the moments of the distribution 
Fig : (4/L)"-P-4 (1 — /L)P-1dL, .  « (28.62) 


a rather unusual form. The results are due to Wilks. 


28.18. The line of generalisation of univariate analysis will now probably be clear. 
Corresponding to most of our results for a single variate there will be a generalised result 
for p variates ; and, in fact, if we like to regard the p-variate as a vector we can often draw 
direct analogies between results for vectors and those for the (univariate) scalar. It is 
of special interest to observe that the role played by the variance in univariate theory is 
taken over by the determinant of the dispersion matrix in multivariate theory. 

Up to this point we have generalised the distribution of variance (the y?-distribution) 
into Wishart’s form, and the ¢-distribution into Hotelling’s form. 

Other results which suggest themselves for generalisation are regression and variance 
analysis. But in a sense our treatment of regressions is already general, for we have dis- 
cussed the regression of one variate on p — 1 others. Below we shall go further and 
examine the relations between p dependent and q independent variates. In vector lan- 
guage, we consider the regression of a p-way vector y on a q-way vector x. We have also 
considered the analysis of variance for the bivariate and trivariate case in Chapter 24 
under the title of analysis of covariance, and since the interest lies mainly in the direction 
of regressions we shall not take the subject further here, though it is capable of develop- 
ment and even, perhaps, of application if data become available in sufficient abundance, 
In the remainder of the chapter we shall, in the first instance, deal with an offshoot of 
regression theory which has some interesting taxonomic applications, namely discrimina- 
tory analysis ; and we shall then proceed to the general problem of the relationship between 
two sets of variates. 


Discriminatory Analysis 
28.19. Suppose we have p observations for each of 2n sample members, and that 
each member can have emanated from one of two populations, m to each population. We 
require to find some measurement depending on the p observations which will enable us 
to assign subsequently drawn members correctly to their parent populations with the 
greatest assurance of success. For this purpose we shall find p quantities 41. . . A” and 
a discriminant function X related linearly to the variates by 
ee 4 |. (28:63) 
The criterion on which we shall rely is that the 4’s must be chosen to maximise the ratio 
of the difference between sample means to the standard deviation within the two classes. 
Any linear function of type (28.63) has variance S, given by 
S=A Nay, , é : : ‘: . (28.64) 


342 MULTIVARIATE ANALYSIS 


where, as usual, a,; is the covariance of x, and x; which we assume to be the same for both 
populations. Burther if the difference a the (ne means of a; is d;, the difference of the 
function X for the two samples is 

D' = dae ‘ : . ‘ . (28.65) 


We have then to maximise for variation in A the function 


DE as : ‘ , . (28.66) 
S Mia. 
This gives for each A 
10S _ SaD 
20D oa’ 
leading to equations typified by 
Nay = de ° . . ° . (28.67) 


Multiplying by a** and summing over 7, we have 


ry ay qk = na qk 


or, replacing k by J, 

7 = e d; as, ° . ° . ° (28.68) 
This determines the /’s, except for the constant eB which can be chosen at will so far as the 
discriminant function is concerned. If c is some constant, we have 

i = de. : ; : : . (28.69) 


The result also holds if there are n, members in the first sample and n, in the second. 
Equation (28.65) remains true, and the rest of the analysis is the same as for equal class- 
numbers. 


Example 28.2 (from R. A. Fisher, 1936a). 


Measurements were made on fifty specimens of flowers from each of two species of 
iris, setosa and versicolor, found growing in the same colony. Four measurements were 
taken, viz. sepal length, sepal width, petal length, and petal width. We denote them by 
21, X, 2 and x, respectively. 

The means of the specimens were (in centimetres) :— 


— 5 ae Difference 

Variate. Versicolor. Setosa. (V—S). 
a 5-936 5-006 0:930 
Ga 2-770 3-428 — 0:658 
i. 4-260 1-462 2-798 


La 1326 — 0-246 1-080 


DISCRIMINATORY ANALYSIS 343 


The sums of squares and products about the means were (in cm.?) :— 


Ly ai Ate 
wy 19-1434 9-0356 9-7634 
Hq 9-0356 11-8658 4-6232 
Ls 9-7634 4-6232 12-2978 
Ly 3-2394 2-4746 3°8794 


The inverse matrix is, in cm.? :— 


oe) v3 


xy 


v4 


0-118,7161 — 0:066,8666 — 0-081,6158 0:039,6350 
Ly — 0-066,8666 0-145,2736 0-033,4101 — 0-110,7529 
Xs — 0-081,6158 0-033,4101 0-219,3614 — 0-272,0206 
oy 0-039,6350 — 0-110,7529 — 0-272,0206 0-894,5506 
| 


We need not bother to divide these quantities by n because there is an arbitrary con- 
stant in our discriminant function which absorbs it. The matrices are diagonally sym- 
metric, and it is not always necessary to write out the values below the diagonal as we 
have done here. 

From (28.69), with c = 1, we then find— 


At = — 0:031,1511 A? = — 0:183,9075 
43 = 0-2221044 At—=  0-314,7370. 
If we choose the coefficient of x, to be unity the discriminant function is then 
X = «x, + 5:9037x, — 7:1299x, — 10-10362,. : . (28.70) 


The mean of X for versicolor, obtained by substituting the means of the x’s for that species, 
is found to be — 21-4815, and that for setosa is 12:3345. The difference is thus 33-816 cm. 
Let us compare this with its standard error to see whether it is significant of real differences 
in the values of X for the two species. 

From the matrix of sums of squares and products we find 


N var X = 4 Aa, = 1085-5522, 
where the /’s are, of course, the coefficients in (28.70). MN here is the number of degrees 
of freedom of the estimate of the variance. There are 100 members altogether, with 99 


degrees of freedom, but we have eliminated four corresponding to the means of the four 
variates. We therefore take N to be 99 — 4 = 95, and find 


var X = 11-4269. 


This is the variance of a single value. That of the difference of the two means of 50 values 
is obtained by division by 25 and is thus 0-4571, the corresponding standard error being 
| 0-676. | | 

~~ The observed difference of means, viz. 33-816, is about 50 times this amount, and 
‘there is thus a real difference in the values of X for the two species. In other words the 
Pettininant function is a good one. It is best among the linear functions of the x’s because ~~ 


344 MULTIVARIATE ANALYSIS 


we have chosen it so that the difference of two values, divided by their estimated standard. 
error, shall be the greatest possible.; To use the function we should, given a flower of 
doubtful species, calculate X for it and assign it to one species or the other according as 
X were nearer to the mean value of X for one species or the other. If, of course, 
the observed value differed from the mean values by more than twice the standard error- 
of each, we should begin to doubt whether it belonged to either. 

_ The analysis may be put in rather a different way. Suppose we analyse the variation 
of X between and within species. The sum of squares between species in the 50 x 2 
classification is 7 _ _ _ 

50 {(Xy — Xe) 

where X,, X, are the respective means and X the mean of the whole. This reduces to 25D?. 
The sum of squares within classes is 1085-55 with 95 d.f., as found above, and we have— 


Sum of Squares. d.f. 

Between species . . . . . 28,588-05 a 

Within species SC 1,085-55 95 
| 

TOTALS 4G wt | 29,673-60 99 


Our method of selecting the discriminant function has been such as to minimise the sum 
of squares within species and, for constant total, to maximise the sum between species, 
and hence to minimise the ratio of the latter to the former. For the moment we cannot 
assume that this ratio may be tested in the z-distribution in the usual way, though we shall 
see presently that this is so. 


28.20. The relationship of discriminatory analysis for two classes and the theory of 
regression may be brought out by introducing a formal variate y for the classes. If there 
are n, members in one class and n, in the other we shall assign the values 

No eas Ny 
My + Ny Ny + N, 
to the y-variate for the two classes respectively. The mean of y for the whole sample is 
then zero and the sum of squares is 
Ny + Ng 


= ¢, say. : : ‘ ; e871) 


Considering now 
Ww = mu x; e . ° ° e . (28.72) 


as a regression equation, we find for the coefficients A 


2 (Y¥a,;) — at 2 (x,a;) = 0, 


or = (Ya) — Atay = 0. > fates -.(28ay 
Now 
XY. = Ne s ny 
( 4) Ny + a1 (=) Ny + Ny a1 (%), 


DISCRIMINATORY ANALYSIS 345 


where the suffixes of the 2’s relate to the first and second classes, 


Thus Cd, =H ay, : ‘ ; : F ; - (28.74) 


which is another way of writing (28.69) with a particular value for the constant c. 


28.21. Pursuing the analogy with regression analysis further, we see that since 


ae) = 
and a (Yx,) = Cd; 
we may analyse the sums of squares as— 
Sums of squares. dite 
Cad, 7: 
€(1 —a'd,) Mm +n,—p—l 
C m+n,—1 : : . (28.75) 


as for a regression line. If R is the multiple regression of Y on the x-variates, 
Re=2’id. . . . .«  .  «  « (28.76) 


In ordinary regression analysis we may test the ratio R?/(1 — R*), multiplied by 
suitable constants, in the z-distribution; but this depends on the assumption that the 
dependent variate y is normal for any fixed z’s. Here we have the case when the dependent 
variate is fixed but the x’s are normal. The test still holds in such a case, the reason being 
the kind of duality we noted in 28.14 in arriving at Hotelling’s distribution. The distri- 
bution of angles between a fixed plane and a random vector is the same as that between 
a fixed vector and a random plane. Consequently the table of (28.75) can be regarded 
as an analysis of variance and the z-test applied. 


28.22. We may extend the discriminant function to the case when the property to 
be discriminated is not, as above, a matter of allocation to one of two classes, but to several 
which may in particular be determined by certain values of a continuous variate. If we 
have various measurements of p xz-variates corresponding to values of a y-variate, we may 
form the regression of y on the x’s and use the resulting function as a discriminator. As 
in the case of dichotomy, the regression will maximise the difference between classes as 
compared with intra-class variation; and its significance may be tested in much the 
same way. 


Example 28.3 (from M. M. Barnard, 1935). 

An investigation was undertaken into the changes taking place over time of the char- 
acteristics of certain Egyptian skulls. There were four sets of skulls, known to be from 
Late Predynastic, Sixth to Twelfth, Twelfth to Thirteenth and Ptolemaic dynasties respect- 


346 MULTIVARIATE ANALYSIS 


ively, and the relative time-intervals were taken to be in the proportions 2:1: 2, so that 
the values of t for the four periods may be taken to be respectively — 5, — 1, + 1, + 5. 
For the skulls four measurements were selected : 


x1, basi-alveolar length ; 
x,, nasal height ; 

X3, Maximum breadth ; 
x,, basi-bregmatic height. 


It is required to find a function 
X=Aa,+/1°'2, +292, +A‘ a, 


which will best discriminate between skulls belonging to different periods. 
The means of the series were as follows, the sample numbers also being shown :— 


F Series I Series IT Series ITT Series IV 
Varies (mn, = 91). (n, = 162). (n, = 70). Ty, 
xy 133:582,418 134-265,432 134:371,429 135-306,667 
Lo 98-307,692 96-462,963 95°857,143 95-040,000 
Ly 50°835,165 51-148,148 50-100,000 52-093,333 
Ly 133-000,000 134-882,716 133-642,857 131-466,667 


The sums of squares and products about the means are— 


v Ai Zs X, | 
ry 9661-997,470 445-573,301 1130-623,900 2148-584,219 
ty oa 9073:115,027 1239-221,990 2255-812,722 
Ly fe 3938-320,351 1271-054,662 
a4 at 8741:508,829 


The mean value of t, 7, for the 398 observations is — 0-432,161, and the values of ¢ —f 
for the four series are accordingly 


— 4:567,839 ; — 0-567,839; 1-432,161; 5-432,161. 


The sums 2'2; (¢ — #) are respectively 


ay 718-762,86 
2s — 1407-260,75 
5 410-101,94 
ty — 733-668,32 


and finally, 2 (¢ — i)? = 4307-668,32. 

We could obtain the coefficients A from the reciprocal of the matrix above on the lines 
of the previous example. It is also instructive to observe, from the analogy with regres- 
sions, that instead of that matrix we may use the matrix (depending on one extra degree 
of freedom, 395 in all) obtained by adding to the sums of squares the regressions on time. 


For instance, instead of 9661-997,470 we have 9661-997,470 + (718-762,86)2/4307-668,32. 


DISCRIMINATORY ANALYSIS 


The resulting matrix is 


wed La Le Ls 
4 9781-927,828 210-762,489 1199-052,135 2026-206,952 
Ly G6c 9532-849,476 1105-246,827 2405-414,318 
5 snc 3977-363,203 1201-230,304 
4 bee 8866-382,928 


The reciprocal of this is (units = 10~*)— 


4 Ue 3 x4 
a 110-368,975 6:938,481 — 28-145,236 — 23-361,935 
Ai 07 115-693,529 — 24:-948,984 — 30-767,069 
fae aie 273-988,409 — 23-666,591 
Cn eran seas 129-990,069 


The resulting values of 2 are 


Ai = 0-075,156,739, A? = — 0-145,490,050, 
43 = 0-144,600,884, A* = — 0-078,538,419 


and these, or constant multiples of them, give us the constants in the discriminant function 
which will best enable us to assign a skull to the correct period by measurements of the 
four specified variates. 

In this analysis we have 398 members, but of the 397 d.f. we have discarded two with 
the general mean. The d.f. of the sum 4307-6683 = 2 (¢ — #)? are 395, of which four are 
attributable to regressions on the other variates. For the contribution of these four we 
have 

MX 718-762,86 + etc. = 375-6657. 


The analysis of variance is thus— 


Sum of Squares. d.f. Quotient. 
Regression . . . - | 375-6657 4 sac 
Remainder | 3932-0026 391 10-0563 
TorTa.Ls. 4307-6683 395 


The analogy of the discriminant function with regressions noted above may be used 
to provide standard errors of the coefficients 2, In our present case the variance of 41 
is obtained by multiplying the remainder quotient, viz. 10-0563, by the term corresponding 


348 MULTIVARIATE ANALYSIS 


to xz? in the reciprocal matrix of sums of squares of the x’s,namely 110-368,975 x 107°. 
This gives a standard error of. 0-0333. We obtain finally 
A= 0-0752 + 0-0333 


42 = — 0-1455 +. 0-0341 I 
43 = 0:1446 + 0-0525 i 
a4 = — 0-0785 + 0:0362. L 


All coefficients exceed twice their standard error, and hence all the variates are useful in 
discriminating between skulls of different periods. 

I am indebted to Dr. M. S. Bartlett for the calculations of this example. His results 
differ from those reached by Miss Barnard in her original investigation since she took an 
unweighted regression of the variates with time, whereas he has weighted the values 
according to sample numbers. He also notes that the significance of the results has been 
tested above on the basis of variability within classes, but that a fuller analysis of the means, 
bringing back the two degrees of freedom discarded, reveals further differences between the 
series. Thus, though the discriminant function will efficiently sort the series examined in 
relation to their periods, we must be cautious about associating the observed differences 
with the time-changes. 


Canonical Correlations 


28.23. We now turn to consider the general theory of the relations between two 


sets of variates 2... % and 241... Xp4,, Where we suppose that p <q. Following 
Hotelling (1936), we shall show that in general there can be found linear transformations 
to variates £,... &,, Coty See Sng such that 


(a) all the ’s have unit variance and zero mean ; 

b) any & in the p-group is independent of the other é’s in that group ; 

c) any & in the q-group is independent of the other é’s in that group ; 

d) the correlation between any é in the p-group and any & in the qg-group is zero except 
for p correlations p, . . . py, Which may be taken to be the correlations between 

Gy ONO Chia oa ONO Coy ee cy BMG eee 

The variates & are then said to be canonical variates and the p’s canonical correlations. 

This part of our work is, fundamentally, the reduction of two quadratic forms and an 
associated bilinear form to canonical types and does not depend on the distribution laws 
of the variates. Furthermore, the reduction can be carried out either on the population 
or on the sample. In the latter case it will yield sample canonical correlations which may 
be written r, . . . 7, and regarded as sample-values of the parent p’s. 

We will suppose that our variates x have zero means and dispersions denoted by o;,, 
where, for the time being, we use o to denote a variance or covariance instead of the more 
usual o?. Those dispersions in the p-group we denote by Greek affixes: o,,, and those 
in the g-group by Roman affixes: o;,. For a covariance of a p-variate with a q-variate 
we write one Greek and one Roman affix: a,,. 
Consider now a particular pair of variates given by 

& = C*&,, Ca ee eS) 
= d* x, cor ae 


( 
( 
( 


ee Hy) 


If their variances are unity we have 
Ce oa it 


aoe oe SP ce ere 8078) 


CANONICAL CORRELATIONS 349 


We will also impose the condition that their correlation R is stationary for variations in 
the coefficients c and d, i.e. that 


R = c* d* o,, = stationary. : ; : - (28.79) 
Equations (28.78) and (28.79) then require an unconditioned stationary value of 
c* d* 4, — tAc* 8 o43 — gud? d? og, : : » (28.80) 


where 4 and yw are undetermined multipliers. This leads to 


C* Ong — UM oy = 9 , 
ee io 2a te sy 


Multiplying the first equation oy d* and summing and the second by c* and summing, 
we have, in virtue of (28.78) and (28.79), 


TAs : ; ; 5 . (28.82) 


Equations (28.81) will then be soluble for the p + gq unknowns c and d if the determinant 
of their array vanishes, that is if, writing 4 for the constants w and A, 


= AGae ° . . = A01y ; O01, p+1 ° . ° 01, p+q 

— Ap - + + —Adpy Op, pt+1 eee ae op, p+a aa 
Ciena. = Op+1,p — doy +1, p41 2s Ani pie 
On+¢,1 ae Ont+a.p —Ah ‘p+q,p+1 Oe DEE i 


. (28.83) 


an equation determining 4. Before studying it further we will throw the equation into 
a somewhat different form. 


28.24. We may write (28.83) as 


=0 . + « ~ smesied) 


v 
t] 
1 
SS See oe eas por ce ce ere 
1 
' 
' 


Multiplying the first p rows by — A and dividing the last g columns by — A we find the 
equivalent form 


=0. . «. «4 (28.85) 


! 
' 
1 
t 
w 
' 
. 


Writing, in conformity with our usual notation, (o%) for the matrix inverse to (o,;) and 
remembering that 
Co 05 


let us multiply (28.85) on the left by 


» es (28.86) 


350 MULTIVARIATE ANALYSIS 


The product of determinants is then 


A? Gp, — O;g 0% oy, | OF Ogg — OF O14 O45 
o” o% o 85 
A? op, — Gig o* Oy 0 | 
! ; l 
ot Cig ; 6; 
which gives 
(—A)*? | A? ¢,, G0" | —0, “ : - (28.87) 


a determinant with p rows and columns multiplied by a power of A. 


28.25. Returning now to our original problem, we see that if a simple root of (28.83) 
is substituted in (28.81) the c’s and d’s are determinate, except of course that they may be 
replaced by —c and —d. For a root of multiplicity m they are determinate except for 
m — 1 assignable constants, a result we take without proof from the theory of algebraic 
forms (reference may be made to Hotelling’s paper for details). 

From (28.87) we see that the equation in 4 has p + q roots. It cannot have fewer, 
for the coefficient of the highest power of / in (28.83) is the product of two principal minors 
which do not vanish unless the variates are linearly dependent, a case which we exclude 
from the discussion. Of these p + q roots gq —p are zero. The remaining 2p can be 
grouped in pairs, each of which is the negative of the other. There are thus roots which 
we may write +i, ... +p,. We choose as the roots those which are not negative and 
proceed to prove that they are the canonical correlations as we have defined them. That 
they are, in fact, correlations follows from (28.82). 

Suppose we have a root p, and determine the corresponding constants c, and d,, and 
hence a pair of variates &, and 7,. Then we have, from (28.81), 


cto,, =p, a 
ie ee, af 2 | ees 
y ong = Py C, Ox8 


Similar equations obtain for a second pair, say é, and »;. Between these four variates 
there are six correlations, two of which are p, and p;. We wish to show that the other 
four vanish. They are 


E (&, &) = c& ¢§ ong E (n, 13) = d2 d3 ogy 
E (&, 13) = ¢} d} oy EH (E, ,) =) d2 Gy. «  .  « (2a 


Multiply the first of (28.88) by df and sum. Using (28.89), we have 

E (E, %) =p, E (n,m). « : P . . (28.90) 
Similarly from the second of (28.88) multiplied by c%, 

E (&3 »,) = p, H (&, &5). . : ‘ « (23:97) 
Interchanging y and 6 we find from (28.90) and (28.91) 

py E (n, 5) = ps # (E, &s)- . : , . (28.92) 
Equally, again interchanging y and 6 in (28.92) we have 

Ps E (n, Ns) = p, £ (E, &s)- : ; ; . (28.93) 


CANONICAL CORRELATIONS 351 


Thus, unless p? = p}, 
E (&, &) = E (n, ns) = 9. , : : . (28.94) 
It follows from (28.90) and (28.91) that the other correlations also vanish. 
We have only to round off the proof by showing that if p is a root of multiplicity m 
the property still holds. This follows from the consideration that we may then choose 
our c’s and d’s to obey certain orthogonal conditions ensuring that 


E (&, &) + E (n, 13) = 0. ; ; . - (28.95) 


It will then follow from (28.92) that each expectation vanishes unless p, = p; = 0; and 
even in this case, (28.91) and (28.92) show that two expectations vanish, and we may then 
choose our assignable constants so that the others vanish. 


28.26. When the variates are put into canonical form the dispersion matrix reduces to 


mo. . 0 power . 2 0 
(te: 0. Oo Spee Ogu vee 0 
=. a 
aa comes lo Gee 
wee eee 
, Pir?) oe oe 


with a determinant equal to 
(1 — pi) (1 — p) . . - (1 — pp). 


Example 28.4 (from Hotelling, 19365, dealing with data of T. L. Kelley). 

140 seventh-grade school children were given four tests in (a) reading speed, (b) reading 
power, (c) arithmetic speed, and (d) arithmetic power. It is required to find canonical 
variates for the two reading tests and the two arithmetic tests. 

The correlations between the variates were— 


Ly Lo Ly Ly 
Xy 1-0000 0-6328 0-2412 0-0586 
Ly 0-6328 1-0000 — 90-0553 0-0655 
X3 0-2412 — 0-0553 1-0000 0:4248 
Ly 0-0586 0-0655 0-4248 1-0000 


The determinant (28.83) becomes 


eh — 0-6328A 0-2412 0-0586 
— 063284 —A — 0-0553 00655 | 4 
02412 —00553 —A = 0404640 


0-0586 0-0655 — 0:-4248/ —A2 


352 MULTIVARIATE ANALYSIS 


or 

0-491,370 A+ — 0-078,803,4 A? + 0-000,362,490 = 0, 
giving A? = 0-155,635 or 0-004,740 
with A = 0-3945 or 0-0688. 


To find the transformed variates themselves we use (28.81). For instance, with the 
root 0:3945 for u, we have 


cl + 0-6328 c? — 0-6114 d1 — 0-1485 d? = 0 


0-6328 cl + c2 + 0-1402 dt — 0-1660 d? = 0 
— 0-6114¢! + 0-1402 2 + d! + 0-4248 d2 = 0 
— 0:1485 cl — 0-1660 c2 + 0-4248 dt + Gq? = 


The last equation is linearly dependent on the other three, so adds nothing. In the other 
three we solve for the ratios of c’s and d’s, finding 


0 eee OD ee (eae 
Thus the See variates are 
k, €2 = — 22-7772 a, + 2-2655 2, 
kz nh — 2-4404 Hs oo Ha; 


where k, and k, may bechosen so that the variances of £1 and 7! are unity, if desired. Similar 
equations with the root 0-0688 will give us a further pair of canonical co-ordinates. Those 
we have worked out have the maximum correlation, the other pair having the minimum 
and therefore being of less interest. 


28.27. In practical cases it is of some importance to know whether an observed 
canonical correlation r,, say, is significant of real correlation. The problem has been solved 
for large samples but not completely for small samples. We shall conclude this chapter 
with a short account of the main results which have been reached. 

For large samples we shall show that, for the standard error of a canonical correlation, 


varr = = — r?)? - : . ° . (28.97) 


a remarkable result showing that the variance is the same as for a product-moment 
coefficient. 


Denoting as usual the sample covariance by a;; we have to the first order 
H (a) = Oi;- . ° . ° e e (28.98) 
To the same order, 


1 
a (ai; Ay) = ma K {2 (ix Lja) = (yp Xp) \. 


If « + f the sums on the right are independent, and there are n (n — 1) such cases. When 
a = 6 we have n terms such as 


E (Hig Lig, Loy Liq) = Fj Fy + Oy Oj, + Oi_ Ty, + « . (28.99) 


as follows from the consideration that the characteristic function of the multivariate normal 
form is 

exp (—4o;,, tt) 
(ef15.12, vol. ip. 376). 


CANONICAL CORRELATIONS 353 


Hence we have 


n(n — 1) 


n . 
E (ai; A) = ee een 0 Oe “fF a2 (645 Om + Oy Oj + OH Fn) 


il 
= 04; Oy + 7 Ht Oy +f Om, On). 2 ° « , eS . (28.100) 
Thus 
EH (da;; da,,) = EH (4;; Ay) —_ O%; On 
1 
=s 7 (Ot Oj, + O%, On). - . ° . (28.101) 
Now for any canonical correlation r we have 


Ce a=. faa — 2 
of ’ 7] 

ee. \ » (28.102) 
If now we define for the sampling deviations in c’s and d’s corresponding to deviations 
in the a’s, 


fe ee CS LO 
t,u PA, 
we find 
2 ag 0% Ac’ + c* cf Aa, = 0 
2a, d* Ad? + dd? Aa, = 0 ; . (28.104) 
Ar, = a, c* Ad® + a, d® Ac* + c* d® = 


Without loss of generality we may now suppose the variates canonical and hence put 
Cec iC — ee — CP — 0) d= 1, d* =... =d?=0. We then find— 
Ar, = 1, Ad + 7, Act + Ady p41 ‘ 

Substituting from the first two in the third of these equations we have 

Ar, = Aa, p+1 7 $Y (Aay, -+ Ady +1, peal . ° . (28.106) 
Similar equations apply for any other simple root, e.g. 

Ar, = Aa, p+2 475 (Ades +. Aan +9, ee): 
Squaring these equations and substituting from (28.101) we find 

nE (Ar,)? = (1 — ri)? 
E (Ar;, Ars) = 0. 


. (28.105) 


Tt follows that 


1 
var r, = — (1 — p*)? 
a 4 » oe ~~» (28.207) 


cov (71, rs) = 9 


to our order of approximation. 


28.28. Equation (28.107) applies to a simple non-vanishing correlation. If a canon- 
ical correlation vanishes and p = q, the result holds, with the qualification that sample 
values of r near the zero root must be allowed to have positive or negative values, or alter- 
natively that the distribution of r is that of absolute values of a normal variate (cf. Exercise 
28.7). Ifp = 2,q> 2a zero root is of multiplicity g at least. In this case, if it has exactly 

A.S.—VOL. II. AA 


354 MULTIVARIATE ANALYSIS 


multiplicity g, nr? is distributed as x? with q — 1 degrees of freedom. For the proof of 
this result see Hotelling (19366). ; 

There is another rather curious difficulty in testing the significance of roots of the 
equation giving the canonical correlations, namely, that if several roots exist it is not pos- 
sible to relate them with certainty to specified parent correlations—any one might have 
arisen from any one of the parent values. This is not serious for large samples when the 
roots are distinct, since the sample values cluster closely round the parent values; but 
for small samples or canonical correlations in the parent which are close together it presents 
a theoretical problem of a novel kind. See Hotelling (1936b) and Bartlett (1941) on 
this point. 


28.29. We proceed to find the sampling distribution of canonical correlations in the 
case when the parent values are all zero and the p-variates and q-variates accordingly 
independent. 

Reverting to equation (28.87) in the form appropriate to samples, we have 


| A? Ag, a Dig gk Bytg | = 0. ° . . ° (28.108) 
We write 
te, = aig ak At ° . ; . ° . (28.109) 
and Ap, = Zpy + bey, , : ‘ : « (282220) 
so that (28.108) becomes 
| re (Zp, a tay) an Ugy | == 0. . . e ° (28.111) 


The significance of this device is that z and ¢ are distributed independently in Wishart’s 
form, as we now proceed to show. 

One instructive way of looking at the problem is to consider the regression of the 
p-way vector y on a q-way vector x. Corresponding to the univariate equation 


y = be +e, . e , «  « (28.0 

where e is a residual, we have 
y, =U, tetas yf : : ; . (28.113) 

where the b’s are given by minimising the sum of n values 

X (Yq, — by, %)? 


k 
2 (Yq, €;) — by 2 (%, %) = 0 
or, in our notation for canonical variates, 


k —_ 
ay, — by any = 0, 


namely, by 


which yields 
=a ae : : , , » (28.114) 
We may analyse the variance of y in the form— 
2X (yg) = X (by Uy + iy)? 
= 65 bag + 2 (a,,)%, : ; . (28 :iiay 
corresponding to the univariate case 
2 (y?) = 6? 2 (x?) + 2 (e?), 


and the two constituents on the right in (28.115) are independent, just as in the univariate 
case. This may be shown by a direct extension of 22.19. 


CANONICAL CORRELATIONS 355 


Furthermore, if we wish to find the linear function of the y’s, say 4* y,, which has 
maximum correlation with the x’s, we have to maximise the ratio 


& (A? b, ;)? __ A* AP BY bg ais 


sna Sc) ee See Ceti :tC(‘<«*‘« TG) 
2 (2° y,)* A 1 dag 
This is equivalent to maximising unconditionally 
Fics Ae (6; b} ai; a r? ag) —— 0, 
giving, for r?, the equation— 
| bt bh a, —r?a,, | — 0. : : : » (8117) 
Now in virtue of (28.114) this reduces to 
| 73 Gg — Wy om &™ Oy, a! | = 0 
or 
|r? a5 — aa” a,, | = 0, : : ’ . (28.118) 


which is equivalent to (28.108) with a slight change of notation. This must be so, for 
we arrived at both equations on essentially the same assumptions. Now we see that the 
term on the right in the determinant of (28.118) is the first item on the right of the variance 
analysis given by (28.115), and the other term in the determinant is the sum 2 (y?) of the 
analysis. It follows that z and ¢ of (28.111) are independent, for they are the constituent 
items of the analysis. Furthermore, the z's will be distributed as sums of squares or pro- 
ducts about the means with n — q degrees of freedom, that is in Wishart’s form; and 
similarly the ¢’s are distributed as q sums of squares or products about the origin, i.e. in 
Wishart’s form with n =q + 1. 


28.30. Without loss of generality we may take the parent variances to be unity ; 
the covariances are zero by hypothesis. The joint distribution of z and ¢ is then, from 
(28.26), 


p 
(eave pat [te P=?) exp { im BD bi ae su) Hat dz 
dF =— 7 a ee 
Qi intl) pip pp Jp (tata \ (tet 
pet 2 2 


In the determinant 


(28.119) 


| A? (2 +t) —t] =0 


put w = 4? and let the roots in w be arranged in descending order of magnitude. Consider 
the distribution for a given value of ¢;; and z;; which in particular we take to be by. Let us 
choose new variates from a set &, obeying the orthogonality conditions— 


p 
, (Six Ex) a 64; 
k=1 


~O0ifi xj 
= Ci‘ CSC: (88.120) 
Make the transformation ty = 2 (Sie Sin Ue) : j ‘ . (28.121) 


k 
y +4 =F Eu Ex) = by. ees) 


356 MULTIVARIATE ANALYSIS 


Instead of the £p(p + 1) values of t,; we will take the p values of wu and pp(p — 1) of the 
&’s as our new variates. We have 


p 
Lt) =| Sa Se Mm |= it Uy . . . . . (28.123) 


p 
jz|[= | Eu ein (1 —4, Uy) | = ae (l—«w) . - . (28,124) 


and have only to consider the Jacobian. This is clearly of degree 4p (p — 1) in u, for the 
Jacobian of t and z + ¢ is the same as that of ¢ and z and only ¢ contributes factors in wu 
in the former. Furthermore, every term (wu; — u;), 1 <j is a factor of J. For consider 
dy, — Uy and let us take as our é-variates those for which j > 4. Then to satisfy the con- 
ditions on the others, derivable from (28.120), 


0 
3, 2 (E:x x) = 9, 


Gin En Os _ Fi 


OE 19 Ex 0812 Eu 


0&,; 


we must have 


= 0, ] 2, 
TE < 
a, 2 
whence ze. = 3E, & (Ex Fj Ux) 


= — 2222 (aw, — 1) e e 7 5 


. (28.125) 


Thus every term (wu; — u;) occurs in J, and there can be no further factors in w because 
the power in u is $p(p — 1). 
Substituting in (28.119) we have, integrating out the £-variates, 


p 
dF=c 7 (Wi@?-V (1 —u)t 2? TT (u;, — u;) du » (28.126) 
where 
k 


arr} Me 7) 


The constant k arises from terms involving n and p in the original density and from the 
Jacobian. It therefore does not involve gq and may be written k(n, p). Evaluation of 
k by direct integration is a matter of some difficulty, but we may find it indirectly 
as follows :— 


In (28.126), if we increase q and n by 2s, the corresponding value of c is 


k(n + 2s, p) 


n{r(itie eat) pest) - 6 — « (28,128) 


The only other term in (28.126) which is affected is that in // (u) and, with the originai 


CANONICAL CORRELATIONS 357 


ce of (28.127), the integral of the distribution so modified would give us the moment of 
order s of IJ (wu), namely of |t|. This may be found in the manner of 28.15 to be 


r(tte seat) (tt) 
9 


a 2 
(2 ie 
aa i a) 
(see: Exercise 28.11). It follows that 
k(n + 2s, p) = a 2 j 5 z , D (28.130) 
k (n, p) r (? = ‘) 
2, 


whence 


k(n p) = HT ("S*)F(p). . ee (28,181) 


It remains to evaluate f(p). To do so we make the substitution in (28,126) 


letting » tend to infinity. Our distribution becomes 
3 (q—p—1) , 
ap —/ (p) (He) exp(—2Z0,) I (v,—) dv. . —. (28.132) 


pt+2—i 
(ea) 
This may be reduced by successive substitutions of the type 
UW — Wi, v; = W; + V1; i I, 


and choosing q¢ at each stage so that the term in JJ (v) vanishes (as we may, since the result 
is independent of g). On integration for v,, then repeating the process, and so on, we find 


of) UT (p+1—i)_, 
p+2—1 Qi (p—1) : 
ar(t : ) 
Using the relation 
P(x) F(a + $) = 2-* 41 /an I (22), 
we have 
ip 
f(p) = Z ee rn OES, 


Thus our distribution is finally 
dF =cll (uit P-? (1 — ui Pe) 5 IT (u, — uy) HT du, . (28.134) 


n—1 
: ie) 
= xP = ; ~ 
c=n2 ep GIS) (fart a (2 el — . (28.135) 
2 2 2 
a remarkable form obtained in the general case by Fisher (1939), P. L. Hsu (19396), and 


Roy (19396). 


where 


358 MULTIVARIATE ANALYSIS 


We have supposed throughout that q > p. In the contrary case we reverse the roles 
of g and p and hence merely have to interchange p and q in (28.134) and (28.135). 


28.31. Let us consider some special cases. When qg = 1 the distribution becomes, 


e) 
dF = ; uk (P—2) (1 — u,)# @-P-9) day, » (28:136 
ee ee ie 
2 2 
confirming the distribution of equation (28.40) leading to Hotelling’s distribution ; for 
the canonical correlation is then the multiple correlation between the q-variate and the 
p-variates ; and as the former is measured from its mean there is one fewer degree of 


freedom, ie. n is replaced by n — I. 
When g = 2 we have 


alr el 
ge) a)? aoe 


aR = (uty ha)#®-8) { (1m) (Lm) FH PH 


x (uy ee Ue) du, du. ° (28.137) 


Writing 
(1 — u,) (1 — uw.) =», 
U1 ++ Ug =U, 
we find 
dF = ia) (v — 1 + w)H@-3) pt --9) dy dw. . (28.138) 


40 (n — p — 2) '(p — 1) 
For given v the limits of w are 1 — v and 2 (1 — v/v), and integrating for w we find 
I’ (n — 2) 2 


ar = 4." (n = i) = 2) T'(p = 1) . ee (1 = a/v) («/v)"-P-4 dv 
or, for /v, 
ae Cate Nee ONG: > + © (2S 


a result due to Wilks—cf. equation (28.62). 


28.32. The distribution of the u’s does not immediately provide a test of significance 
of the canonical correlations, except when there is only one of them. The criterion 


v= JT (1 — u) : : : : . (28.140) 


is sometimes useful in the general case for testing simultaneously the departure of the 
ws from zero. Cf. Exercises 28.11 and 28.12. 


NOTES AND REFERENCES 


Among earlier papers in which various aspects of the multivariate problem began to 
be studied, reference may be made to Karl Pearson (19265) on the “ coefficient of racial 
likeness ” and Ragnar Frisch (1929), who independently arrived at the dispersion matrix 
and proposed to call its determinant in standard measure the “ scatterance”’. Reference 


NOTES AND REFERENCES 359 


to the papers by Wishart (1928), Wishart and Bartlett (1933c) and Hotelling (1931) on the 
generalised product-moment distribution and the generalised ‘‘ Student ”’ ratio has been 
made in the text. 

In more recent literature three lines of development are discernible :— 

(a) American writers have developed the theory of canonical correlation and multiple 
analysis mainly on algebraic and analytical lines. See Hotelling (1933, 19366), Wilks 
(1932e, 1934, 1935b, 1935c, 1936, 1943), Girshik (1939), and Madow (1938). 

(6) English schools have investigated the theory of discriminant functions and devel- 
oped the sampling theory of canonical roots. See R. A. Fisher (1936a, 6, 1938c, 1939), 
1940d), P. L. Hsu (1938c, 1939b, 1941a, c, d), and for illustrative material Martin (1936), 
Barnard (1935), Fairfield Smith (1936) and Wallace and Travers (1938). See also Bartlett 
(19346, 1938c, 1939b, c, 1941), E. S. Pearson and Wilks (19336), Welch (19396), Lawley 
(1938) and Bishop (1939). Simaika (1941) has proved that tests based on Hotelling’s 7 
and the multiple correlation coefficient are uniformly most powerful in the class depending 
on a single parameter. 

(c) The Indian school, whose contribution has not been referred to in this chapter, © 
has developed some interesting work based on what is known as the D2-statistic. See 
Mahalanobis (1930, 1936a), Mahalanobis, Bose and Roy (1936), R. C. Bose (1936a), R. C. 
Bose and Roy (1938c), and later papers in Sankhyd. If, with two samples from p-variate 
populations, d; is the difference of sample means for the ith variate, the studentised 
D?-statistic is 


D? =* a dd, 
Pp 
. where a refers to the reciprocal of the sample dispersion matrix. Bose and Roy have 


shown that in normal samples this has the same distribution as one of Fisher’s forms for 
the multiple correlation coefficient. The corresponding parameter for the population 


A? = all 66, 


is known as Mahalanobis’s generalised distance. 


EXERCISES 
28.1. In a four-variate normal distribution show that the correlation between the 
covariances d@,, and a,, is 


P13 P24 ar P14 Pes 
{ Ce Piz) a Ps) - 


(Wishart, 1928.) 


28.2. For a pair of normal variates with correlation p, show that, defining v by 
N A12 
y= —____— _., 
01 02 (1 — p’) 
we have for the frequency function of v 
1 — p2)}™—1) ere 
——— 


f/m Qin} r(*S 


ee” Kina) 5 


360 . MULTIVARIATE ANALYSIS 


for y> 0 and a similar expression with — v for v inside curly brackets if v <0. Here 
K is the Bessel function of second kind with imaginary argument. 


(Wishart and Bartlett, 1933c. See also K. Pearson and others, 1929.) 


28.3. Show that if & sets of variates a”, h=1...h; i,f=1...p are each 
distributed in Wishart’s form, with sample numbers 7, . . . ,, then the variates 


k 
= ) (h) 
cs 
k 


are also distributed in Wishart’s form with n = ye, (n,). (This follows readily from the 
h=l 
characteristic function. It is a generalisation of the additive properties of y*.) 


28.4. If a sample of n is chosen from a p-variate normal population, the variates 
being grouped into k& classes %, 4%... %p3 U4, ++s 


. Lp», consider the function— 


Xp.+Ds ; ePerstg Lp, + «+ PeR-1+1 


where 1, = 1 and r‘") is zero if the variates belong to different classes and equals the cor- 
relation 7r,; if they belong to the same class. 
By considering the function 
‘ne 


n—2 n—1t 
Pe Pe 
poe it It F Hil . 


gee | Sen ae a a 
t=1 il r( 5 +r] il r( 5 ) 


(Wilks, 1935. The distribution provides a test of the independence of k sets of normal variates.) 


show that 


28.5. As a particular case of the last exercise, show that if a single variate x, is 
independent of a second set x, ... %,, then— 


key ay) 


ene) 


and hence find the distribution of the multiple correlation coefficient when the parent 
coefficient is zero. 


He 


(Wilks, 19350.) 


28.6. Show algebraically that Hotelling’s 7 is invariant under linear transformations 
of the p variates. 


28.7. If the determinantal equation (28.83) with p = q has a double root equal to 
zero, show that for large samples the value of r corresponding to the canonical correlation 


EXERCISES 361 


is given by omitting all terms in the determinant when expanded, except those in A? and 
4°, Noting that the latter is a perfect square, show that r is the ratio of a polynomial 
in the sample dispersions to a non-vanishing function regular in the neighbourhood of 
zero. Hence that (28.107) holds when p = 0. 

(Hotelling, 1936b.) 


28.8. In the notation of 28.23, if 


Aer, | B= | a; | 
0 at . | Ong Oui 
C = |-------- W-------- : D = |------- enna 
Cin °° Fy | Om 1 


show that the vector correlation coefficient K defined by 


(— 1) C 
i 
ae 08 


and the square of the vector alienation coefficient Z defined by 


D 
Lo 
AB 
are invariant under linear transformations of the variate. Also that 


K = +pipr+ +> pp 
Z = (1 — pi) (1 — pi)... (1 — p) 


where the p’s are canonical correlations. 
(Hotelling, 1936b.) 


28.9. In the notation of the previous exercise, k and z being the sample values of 
K and Z, show that if the population canonical correlations are all distinct, 


__1lp, Vf —p-)? 
var k = — K De peers 


i=1 
p 
4 2 
var z ay D7 
1 


j 2 - : 
cov (k,z) = — ep) (1 — pi). 
In particular, when p = 2, 
vark =~ {(1—K*)*—~Z(1+K)} 


2 
varz = (1% + 


cov (k, 2) = — “KZ (1 +Z —K?), 
(Hotelling, 19360.) 


362 MULTIVARIATE ANALYSIS 


28.10. In the previous exercise, with p = q = 2, show that, in standard measure, 
Lo Tis Tea — Tig Nog 
{ ee ee) (ee) } 
and hence derive a test of significance of the “ tetrad difference ”’ 1,5 ro, — 144 193. 
(Hotelling, 1936b.) 


28.11. In the notation of Exercise 28.9, show that 


: r(stettat) p(t) r(*S) 
BAe) =i 


inl} p gt+1l—z r *=i=')r n+a+t2B—7% ; 
2 2 Pe 


(Girshik, 1939.) 


28.12. Find the characteristic function of — logz, where z is defined as in the 
previous exercise, and hence show that — n logz or, to a better approximation, 
—{n—1—43(p+q + 1)}logz tends to be distributed as 7? with pq degrees of freedom 
when n is large. 

(Bartlett, 1938c.) 


CHAPTER 29 


TIME-SERIES—(1) 


29.1. A time-series, as its name indicates, is a series of values assumed by a variable 
at different points of time. We shall consider only cases where the variable is univariate 
and shall denote its value at time ¢ by uw, The study of such series forms an important 
branch of statistics because the majority of types of time-variation encountered in practice 
are not of the regular functional type in which u, can be represented exactly by a mathe- 
matical function of t, but present in some degree those irregularities of a random character 
which can only be discussed in terms of probability. One of our main problems, in fact, 
will be to isolate systematic from casual effects in the series so as to be able to study 
them separately. 


29.2. In general it is possible to observe a time-variable at any instant, and thus 
the temporal intervals between successive members of the series need not be the same. 
Practice and theory alike, however, usually require the observations to occur at regular 
intervals, and in the sequel we shall assume, unless the contrary is specifically stated, that 
the interval from each observation to the next is the same throughout the series. As 
a matter of convenience we may take this interval as our time-unit and write the series as 


RCO EO Deh Mn P ; - « (2980) 


where ¢ must be an integer. Where a series extends backwards and forwards from some 
given point which we wish to regard as origin we may write it as 


em ymcineerrmte 2g; Umi gnats sm th eet e Ly e/a te f « (29.2) 


In this chapter and the next we shall study the way in which wu, varies with ¢, such variation 
being in general of the stochastic type, that is to say, involving random variables. 


Some Examples of Time-series 

29.3. Tables 29.1 to 29.5 provide some examples of the kind of variation encountered 
in practice. Table 29.1 (illustrated in Fig. 29.1) gives the annual yields per acre of barley 
in England and Wales from 1884 to 1939. Table 29.2 (Fig. 29.2) shows the human popula- 
tion of England and Wales at ten-yearly intervals from 1811 to 1931. Table 29.3 (Fig. 29.3) 
gives the sheep population of England and Wales for each year from 1867 to 1939. 
Table 29.4 (Fig. 29.4) gives the annual rainfall in London for each year from 1813 to 1912. 
Table 29.5 (Fig. 29.5) gives the average egg-production per laying hen in the U.S.A. for 
each month of the years 1938 to 1940. 


363 


364 TIME-SERIES 


TABLE 29.1 
Annual Yields per Acre of Barley in England and Wales from 1884 to 1939. 


(Data from the Agricultural Statistics.) 


Yield per Yield per Yield per Yield per 
Year. acre ren — acre (cwts.). Year: acre (cwts.). Year, acre (cwts.). 
15-2 1898 16-9 1912 14:2 1926 16-0 
16-9 99 16-4 13 15:8 27 16-4 
15-3 1900 14:9 14 15:7 28 17:2 
14-9 01 14-5 15 14-1 29 17:8 
15:7 02 16-6 16 14:8 30 14:4 
15-1 03 15-1 17 14-4 31 15-0 
16-7 04 14-6 18 15-6 32 16-0 
16-3 05 16-0 19 13-9 33 16-8 
16:5 06 16:8 20 14-7 34 16-9 
13-3 07 16-8 21 14:3 35 16-6 
16:5 08 15-5 22 14-0 36 16-2 
15-0 09 17-3 23 14:5 37 14:0 
15:9 10 15-5 24 15-4 38 18-1 
15:5 11 15-5 25 15:3 39 17:5 


= 
jes) 


~ 
N 


i hi Ny AL LALA 
alae 
aie 


1880 1890 1900 1910 1920 1930 19,0 


Years. 
Fia, 29.1.—Graph of the Data of Table 29.1 (Barley Yields per Acre). 


S 


~ 
Nn 


Yield (cw? per acre). 


BR 


13 


SOME EXAMPLES OF TIME-SERIES 365 


TABLE 29.2 
Population of England and Wales at Ten-Yearly Intervals from 1811 to 1931. 
(Data from the Registrar-General’s Statistical Review, 1933, Part II.) 


ve Population 
(millions). 
| 

1811 10-16 
21 12-00 
31 13-90 
41 15:91 
51 17-93 
61 20-07 
71 22-71 
81 25:97 
91 29-00 
1901 32-53 
itl 36-07 
2) 37-89 
31 39-95 


Population (millions). 


1811 1831 1857 1871 1891 1971 1931 


Years. 
Fic. 29.2.—Graph of the Data of Table 29.2 (Population of England and Wales). 


366 TIME-SERIES 


TABLE 29.3 


Sheep Population of England and Wales for each Year from 1867 to 1939. 


(Data from the Agricultural Statistics.) 


Population Population Population Population 
Year. (10,000). Year. (10,000). Year. (10,000). Year. (10,000). 
1867 2203 1886 1892 1905 1823 1924. 1484 

68 2360 87 1919 06 1843 25 1597 
69 2254 88 1853 07 1880 26 1686 
70 2165 89 1868 08 1968 27 1707 
71 2024 90 1991 09 2029 28 1640 
72 2078 91 2111 10 1996 29 1611 
73 2214. 92 2119 11 1933 30 1632 
14 2292 93 1991 12 1805 31 1775 
15 2207 94 1859 13 1713 32 1850 
76 2119 95 1856 14 1726 33 1809 
17 2119 96 1924 15 1752 34. 1653 
78 2137 97 1892 16 1795 35 1648 
79 2132 98 1916 17 1717 36 1665 
80 1955 99 1968 18 1648 37 1627 
81 1785 1900 1928 19 1512 38 1791 
82 1747 ol 1898 20 \ 1338 39 1797 
83 1818 02 1850 21 1383 
84 | 1909 03 1841 22 1344 
85 1958 04 1824 23 1384 
Pe 
22, — 
re 
= 20 
o 
E 
2 18 
AS) 
3 
con 
fe 16 
3. 
= 
ty 
14 
12 
1865 1885 1905 1925 1945 


Years. 
Fic. 29.3.—Graph of the Data of Table 29.3 (Sheep Population). 


SOME EXAMPLES OF TIME-SERIES 367 
TABLE 29.4 


Total Annual Rainfall at London in Inches, for each Year from 1813 to 1912. 


(Data from D. Brunt, Phil. Trans. A, 225, 247, 1925.) 


* Year. 


1813 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33 
34 
35 
36 
37 


Rainfall Y Rainfall Yy Rainfall Y Rainfall 
(inches). oar (inches). on (inches). oo (inches). 
23-56 1838 21-63 1863 21-59 1888 27-74 
26-07 39 27-49 64 16-93 89 23-85 
21-86 40 19-43 65 29:48 90 21-23 
31-24 4] 31-13 66 31-60 91 28-15 
23-65 42 23-09 67 26:25 92 22-61 
23-88 43 25-85 68 23-40 93 19-80 
26-41 44 22-65 69 25-42 94 27-94 
22-67 45 22-75 70 21-32 95 21-47 
31-69 46 26-36 71 25-02 96 23-52 
23-86 47 17-70 72 33-86 97 22-86 
24-11 48 29-81 73 22-67 98 17-69 
32-43 49 22-93 74 18-82 99 22-54 
23-26 50 19-22 15 28-44 1900 23-28 
22-57 51 20-63 16 26-16 01 22-17 
23-00 52 35-34 17 28-17 02 20-84 
27-88 53 25-89 78 34-08 03 38-10 
25-32 54 18-65 79 33-82 04 20-65 
25-08 55 | 23-06 80 30-28 05 22-97 
27-76 56 22-21 81 27-92 06 24-26 
19-82 57 =| =: 22-18 82 27-14 07 23-01 
24-78 58 | 18:77 83 24-40 08 23-67 
20-12 59 28-21 84 20°35 09 26-75 
24-34 60 32-24 85 26-64 10 25-36 
27-42 61 22-27 86 27-01 11 24-79 
19-44 62 | 27-57 87 19-21 12 27-88 


Sy8) 


& 
S 


i) 
Sp] 


ies) 
S 


Annual Rainfall (inches). 


is 
1860 1870 1880 1890 1900 1910 


Years. 
Fig. 29.4.—Graph of the Last 50 Terms of the Data of Table 29.4 (Rainfall). 


368 TIME-SERIES 


TABLE 29.5 


Average Number of Eggs per Laying Hen in the U.S.A. for each Month of the Years 1938-1940. 


(Data from Report of the Bureau of Agricultural Economics, U.S. Dept. of Agriculture, on the 
Poultry and Egg Situation, March, 1941.) 


| 
Year. Jan. | Feb. | Mar. | Apr. | May. | June. | July. | Aug. | Sept. | Oct. | Nov. | Dec. 
1938 7-9 9-9 | 15-4 | 17-5 | 17-3 | 149 | 13-6 | 11-8 | 9-4 7:5 5-9 6-4 
1939 8-0 9-7 | 14:9 | 17-0 | 17-0 | 14:6 | 13-2 | 11:7 | 9-3 7:4 6-0 6:8 
1940 7-2 9-0 | 14-4 | 16-5 | 17-0 | 14-8 | 13-4 | 11:8 | 9-7 U9 6-2 6:8 
20 


[\ 


~~ 
QQ 


rR 
’ MY ee 
| 22 oe 


Mar. June Sept. Dec. Mar. June Sept. Dec. Mar. June Sept. Dec. 
1938 1939 1940 
| Date. | 
Fig. 29.5.—Graph of the Data of Table 29.5 (Egg Production). 


Average Number of Eggs per Hen. 
uw 3S 


These series are fairly typical of the kind of material with which our theory has to 
deal. The data of Table 29.1 (barley yields) present a very irregular fluctuation, and so 
far as the eye can see (which is not a decisive test) there is no systematic oscillation and no 
regular movement in mean yields over the period. By contrast, Table 29.2 (human popula- 
tion) shows a relatively smooth movement without apparent oscillation. Table 29.3 (sheep 
population) combines a gencral decline in numbers with marked oscillatory effects which, 
though not perfectly regular, appear to be systematic to some extent. Tables 29.4 and 
29.5 exhibit an oscillatory effect which is definitely seasonal for the latter and much less 
regular for the former, neither indicating a variation, in the periods covered, of the average 
values about which the series oscillate. 


29.4. It must not be overlooked that our method of determining the values of the 
series at fixed equal intervals of time may suppress evidence of oscillatory movements 
which have a period equal to those intervals or to some sub-multiple of them. Suppose, 
for instance, that there was a systematic oscillation in the English population expressible 


ANALYSIS OF TIME-SERIES 369 


by a harmonic compcnent with period of exactly 10 years, or exactly 5 years, or exactly 
33 years. Clearly, by observing the series at 10-yearly intervals we should never find any 
evidence of this effect, for it would contribute exactly the same amount to each observation, 
without oscillation. In the population case, of course, we have collateral evidence to 
indicate that no such oscillation exists, but where nothing is known of the series otherwise 
we can never exclude the possibility of a period exactly equivalent to our time-interval. 
Sometimes, in fact, we know that it is there, and choose our interval so as to exclude the 
oscillation from consideration. For instance, in our sheep population we know that there 
is a seasonal effect within the year, which is not brought out in Table 29.2 because the 
sheep census is taken on June 4th each year; and again, in the rainfall data of Table 29.4 
we have taken as representing the year the whole rainfall within the year, knowing quite 
well that rainfall is seasonal to some extent, even in London. 


29.5. A general survey of these and similar series suggests that the typical time- 
series may be regarded as composed of three parts :— 

(a) a trend, or long-term movement ; 

(6) an oscillation about the trend of greater or less regularity ; 

(c) a “random ”’, “irregular’”’ or “‘ unsystematic ” component. 

It is customary to regard the series as composed of these elements superposed one on 
another ; that is to say, we consider the movement of the series as the sum of three dif- 
ferent components which may be generated by different causal systems. Particular series, 
of course, need not exhibit them all. That of Table 29.2 (human population) seems 
to be almost entirely trend, with perhaps a small unsystematic residual, whereas that of 
Table 29.5 (egg production) appears to be entirely oscillatory, and very regularly so. 
But some series at least exhibit all three. 


29.6. The primary problem of time-series analysis from the statistical viewpoint is 
to isolate the three factors for individual study, and in this chapter and the next we shall 
be mainly concerned with various methods of carrying out the necessary analysis. Before 
proceeding, however, we must look a little more closely into the reality of the effects which 
we are investigating and the basis on which we assume that the analysis is legitimate. 


29.7. Perhaps the easiest component to understand and to remove from the series 
is the seasonal effect. This is a fluctuation imposed on the series by a cyclic phenomenon 
external to the main body of causal influences at work upon it. The oscillation in egg- 
production in Table 29.5, for instance, reflects the rhythm in the reproductive process 
which is found among birds in virtue, ultimately, of the fact that the earth goes round 
the sun once a year. Strictly speaking, we ought to contine the word “ seasonal ”’ to those 
effects which are annual in period ; but where no confusion is likely to arise we can apply 
the same word and the same ideas to any phenomenon generated by strictly periodic natural 
processes, such as “‘ spring” and “ neap ” variation in tides or daily variation in tempera- 
ture. We must, however, be careful about extending the notion of seasonality to phenomena 
which are not demonstrated beyond reasonable doubt to depend on strictly periodic stimuli. 
For instance, it would be going too far, in the present state of our knowledge, to speak of 
sunspot variation as seasonal in this sense, and much too far to speak of seasonality in 
crop-yields as determined by sunspots, even if the relation between the two were estab- 
lished. We shall return to this point below when defining what we mean by a “ cycle” 
as distinct from an “ oscillation ”’. 

A.S.—VOL. IL. BB 


370 TIME-SERIES 


29.8. As we noted in 29.4, the seasonal effect may already be removed from the 
series by the way in which the data are specified. Where we ourselves have any choice 
in the determination of the data, we may eliminate seasonality in the same way, namely, 
by selecting for measurement of the series a point of time which is fixed in relation to the 
year, such as June 4th for the agricultural returns of England and Wales, or by averaging 
over the year, or (what is much the same thing) by cumulating the series over the year, 
as for instance with rainfall data. 


29.9. The concept of trend is more difficult to define. Generally, one thinks of it 
as a smooth broad motion of the system over a long term of years, but “long ” in this con- 
nection is a relative term, and what is long for one purpose may be short for another. For 
example, if we were examining rainfall records over a hundred years a slow rise from the 
beginning of the period to the end would be regarded as a trend ; but if we possessed records 
for two thousand years (and the rings in some of the giant redwood trees give an index of 
climatic conditions for periods of this order) the rise over a particular century might appear 
as part of a slow oscillatory movement, so that any inference from the “ trend ” in a par- 
ticular century to the effect that the weather was likely to continue becoming wetter and 
wetter might be quite false. What inference we should make in practice would depend 
on what we were trying to do. If we were engineers designing a water-supply system and 
wished to provide against droughts of reasonable extent, we might perhaps assume that the 
trend would last as long as our works and proceed accordingly ; but if we were attempting 
to study climatic changes over the face of the earth for geological periods of time we should 
accept the continuance of the trend with the greatest reserve or, more probably, should 
reject it on collateral grounds. 


29.10. However long a series may be, we can never be certain, and often not even 
reasonably sure, that a trend in it is not part of a slow oscillation, except of course when 
the series has terminated (as might, for instance, be the case if we were considering the 
lengths of reigns of the Roman Emperors). In speaking of a trend, therefore, we must 
bear in mind the length of the series to which our statement refers. Perhaps it would be 
more accurate to speak of slow or quick movements rather than of trend and oscillation, 
but even so the distinction between the two would remain a matter of subjective judgment 
to some extent. 


29.11. When seasonal variation and trend have been removed from the data we 
are left with a series which will present, in general, fluctuations of a more or less regular 
kind. Fig. 29.1 represents the kind of series we obtain, since it has no components of 
trend or seasonality. The question then arises, is this residual series systematic in the 
sense that its values can be represented as a function of the time? Or, on the other hand, 
are the values random in the sense that they could occur, in the observed order, by random 
sampling from a homogeneous population ? Or again, is there some possibility intermediate 
between complete functional variation and complete randomness? The search for syste- 
matic effects in residual fluctuation gives rise to several techniques of analysis, the object 
of which is to detect whether any part of the series is subject to law, and therefore predict- 
able, and whether any part is purely haphazard. The former part we shall call systematic, 
and it will be referred to as an “ oscillation ” (not a “ cycle”, which is a very special case 
of an oscillation, as we shall see later). The remainder of the series we shall call the unsys- 
tematic component, and refer to its movements as “‘ random”. When a series is a mixture 


DETERMINATION OF TREND 371 


of oscillation and random movement it will not cause any inconvenience to refer to the 
up-and-down movement generally as fluctuation before we have analysed it into its con- 
stituents ; that is to say, we may speak of fluctuation without prejudice to the possibility 
of detecting oscillatory movements in it. 

In this chapter we study trend and random residuals. In the next chapter we shall 
deal with oscillatory and cyclical components. 


29.12. The logician or the economist who wants to be difficult can always maintain 
that, although any series can be separated into our three specified components as a matter 
of mathematical or statistical analysis, the results throw little or no light on the causal 
influences at work to produce the series. To such a critic we have to concede, I think, 
that in carrying out the analysis we have at the back of our minds the strong possibility 
that the three elements are due to independent causal systems. If he refuses to accept 
this view—and some economists do—we can only invite him to produce a better statistical 
method. 

Possibly the reader will feel, on reaching the end of Chapter 30, that we have not been 
wasting our time, and that our methods do throw light on the way in which time-series 
behave. If not, he should consult some of the references and see whether he finds them 
statistically more satisfying. 


Determination of Trend 


29.13. It is an essential part of the concept of trend that the movement over fairly 
long periods is smooth. This means that we can represent the trend component, at least 
locally, by a polynomial in the time element t. Thus, given the series u, we may, in the 
first instance, seek for some polynomial 


eat | d, ee a, Benoa) 


which will give an account of the trend movement. By taking p great enough we can, of 
course, obtain as close a representation as we like to a finite series; and how large we 
take p is a matter for decision in particular cases. 

If the polynomial is fitted to the whole series by least squares, it evidently gives the 
curvilinear regression line of u, on the variable ¢. This method would then lead to the 
fitting of regressions in the manner of Chapter 22, and we need not repeat here what has 
been said on the subject in that chapter. In Example 22.7 we did, in fact, fit a quartic 
to the population data of Table 29.2 and found a good fit. 


29.14. It is, however, clear that to obtain a satisfactory treud-curve for data such 
as that of Table 29.3 (sheep population), we should have to take a polynomial of rather 
high order. This may appear somewhat artificial and in any case the coefficients of such 
a polynomial, being based on high-order moments, would be very unstable from the sampling 
viewpoint. A more practical objection, though by no means an unimportant one, is that, 
if we add another term to the series, as for example if we are keeping an annual series up 
to date from year to year, the work of fitting has to be done afresh each time. Moreover, 
the trend-line may be affected throughout its length. When, therefore, the series has no 
very obvious trend such as that of Table 29.2 it is more convenient to use the simpler 
methods described below. 


372 TIME-SERIES 


Moving Averages 


29.15. An alternative to finding a polynomial which will represent the whole series 
is to determine a polynomial which will represent a part of it, and to use different poly- 
nomials for different parts. The simplest method, and one which forms the basis of the 
majority of methods of trend fitting, is to take the first m terms (m being chosen at will), 
fit a polynomial of order p, not greater than m — I, to them, and use that polynomial to 
determine the value in the middle of its range; then to repeat the operation with the m 
terms from the second to the (m + 1)th, and so on, moving on one term at each stage. 
Unless other considerations require it, we take m to be odd, so that the middle point of 
the range corresponds to a value which is actually observed. Otherwise the middle point 
falls half-way between two observed values, or we have to use some value of the fitted 
polynomial other than the middle point, which results in a loss of useful symmetry. 


29.16. Suppose, then, that the number of terms is chosen to be odd and is denoted, 
with a slight change of notation, by 2m + 1. Without loss of generality we may denote 
the terms by U_im, U_(m—1)» + + - Uor + + + Um—4, Um If we choose to fit to them a poly- 
nomial of the pth order (29.3) we may, in the usual way, determine the coefficients by 
least squares, i.e. solve the equations 


0 "7 ; 
ae == = = p)2 — = 
da; ve; (uj —A—... —aP P=, j = 0. pe . (29.4) 


which will give us equations typified by 
S (# u,) —a 2 (#) —a, 2 (#41) —... —a, D(H”) =O. . (29.5) 


Now the sums & (é’) are functions of m only. Thus, if we solve (29.5) for a, we shall find 
an equation of the form 


By = Co + C1 Wim + C2 U_m—1y) > © © + Comai Um; ° . (29.6) 


where the c’s depend on m and p, but not on the w’s. 

Now w, assumes the value a, at ¢ = 0 and hence this value, as given by (29.6), is the 
value we require for the polynomial. As we see, this is equivalent to a weighted average 
of the observed values, the weights being independent of which part of the series is taken. 
Thus our process of fitting a trend-line consists of determining the constants ¢ (which 
depend on m and p and therefore give us a twofold element of choice) and then calculating, 
for each consecutive set of (2m + 1) terms in the series, a value given by (29.6). If the 
terms are U, . . . Ugmiz, the calculated value will correspond to t = m-+ <a. There will 
be no values corresponding to the m terms at the beginning and the m terms at the end. 


Example 29.1 


Suppose we have a series and wish to fit a curve which best approximates to sets of 
seven points ; and suppose we regard a cubic as providing a satisfactory approximation. 
What are the weights of the moving average ? 

We have m = 3 and p = 3, and our polynomial is 


Uy=a +a,t+a, t? + a, £3. 


MOVING AVERAGES 373 


Taking our origin at t = 0, we find, for equations (29.5), in virtue of the fact that 2 (t*) = 0 
- for odd &, 


2h) = Wd, + 28a, 

PRAT ore 28a, + 196a, 
2) = 28h, + 196a, 
POT 196a, + 1588a, 


giving, for a, 


C= ae (u) — X(t?) } 
= {— 2u_9 + Bug + 6u_y + Tuo + 60, + Buy — 20%}. 


We may write this conveniently as 
it 
ape 3, 6, as 6, 3, = 2| 


or, when symmetrical formulae are used, as in the present case, by 
(es Oe? eee eT, 


denoting the middle term by heavy type. 
To take a simple illustration. Suppose the series is given by the following values :— 


uu: 0 1 8 27 64 125 216 343 512 729 


We have, for the trend value at ¢ = 4, 
Pe =f (—2 x0) +(3 x1) +(6 x8) +(7 x 27)-+(6 x 64) +(3 x 125) —(2 x 216) }= 5 587} 


os 
Similarly, at ¢ = 6 we find 


a == {(— 2 x 8) + (8 X 27) +... —(2 x 512) } 


lo, 


In both cases the trend-value is equal to the actual value of the series, and this obviously 
must be so when we note that we are fitting a cubic to the series 
U; = (é a iL)? 
It will be observed that in this example we should have obtained the same value for 
a, if we fitted quadratics instead of cubics; and generally the case » odd includes the 
case of the next lowest (even) value of p, so that we need not give separate formulae for 
even p. 


29.17. Writing a, [k] for the value of a, calculated in the above manner for an average 
of k successive terms, we find the following formulae up to p = 5. The reader may care 
to verify them for himself as an exercise. 


374 TIME-SERIES 


Quadratic and Cubic 


1 
3013, 1a 
dy [5] ae ] 
i 
A eee ae 
1 
i oie tno, 64, bone 
(Slitisa, | ; ] 
1 
1) (= 36, 9, 44; 69, 84, 89, ... 
an 
ig] 1. [= 1, 0/00) 46) 21) cere (29.7) 
143 
[rsy [= 78, ey 42, 87 122) 147, 162 eG eee 
1105 
ey 
i77 —. s—21, —6, 7, 18, 27, 34, 39, 42, 43,... 
tah rel eu 
[19] a [— 136, — 51, 24, 89, 144, 189, 224, 249, 264, 269, .. .] 
[21] — [— 171, — 76, 9, 84, 149, 204, 249, 284, 309, 324, 329, . . .] 
vo 
Quartic and Quintic 
7 is = soni. 13 | | 
[ 7] 53) | iS) Nae 9 > 9 6 «© @ 
1 
| i158, — 65, 30, 135, 179, .. . 
I 
pi isue ssn 0. 60.do0, dager 
tule eel 
fae == [110 = 198, =aeeeoesen cor mona | 
2431 
1 eg (29.8) 
[15] ygq (2145, — 2860, — 2037, — 165, 8755, 7500, 10,125, 11,063, . . .] 
(17] T99 1198: _ 195, — 260, — 117, 135, 415, 660, 825, 883, . . 4] 
[19] oe [340, — 255, — 420, — 290, 18,405, 790, 1110, 1320, 1393, . . .] 
1 
21) _* (11,628, — 6460, — 13,005, —1 Es 7 
{21] 360,018 | ,628, ; 3,005, 1,220, — 3940, 6378, 17,655, 


28,190, 36,660, 42,120, 44,003,...] | 


29.18. Several methods have been proposed to simplify the arithmetic of fitting 
a trend-line by moving averages, the large numbers in some of the expressions in (29.7) 
and (29.8) involving considerable labour in straightforward application. The simplest, 
perhaps, is that of iterated averages. 


Suppose we take an average of sets of four with equal weights—a very simple process 


MOVING AVERAGES 375 


—and then another average of the same kind of that average. If the primary series is 
u,, the result of the first operation will be to give a series 


1 
Ur = 7 (ua + Ua + te + a) 


i! 
Wh == a (We + Us + % + Us), etc., 
and that of the second operation to give 


1 
Wie (Pr | Us 1 Os M1) 


1 
=— 16 [uy - 2s +: 3Us aL 4u, + 3Us + YA) a Uy]. ° e (29.9) 
We may write this symbolically as 
J = 1 
ss 1 1 1 — ——_ oes ° . . sll 
Gobinb=5 234... (29.10) 


‘ 1 ; : : 
or, reserving the symbol k [k] for a simple arithmetic mean of & terms, as 


1 aT 
cgl4P= yell 234...) © . ©  « (20.11) 


Now compare the weights of the average derived in Example 29.1 for fitting a cubic 

to seven points. Reduced to unit divisors we have for the weights of the latter 
— 0:0952, 0-1429, 0-2857, 0°3333 ... 
and for the weights of (29.9) 
0-0625, 0-1250, 0-1875, 0°2500 ... 

The two are not identical, but they follow the same sort of course and it might be possible 
to regard the latter as an approximation to the former. (We shall derive better approxi- 
mations presently, but this will serve for purposes of illustration.) Now the iterated 
summation resulting in (29.9) is much easier to carry out than the single weighted averaging 
process of Example 29.1. Generally, if we can find averages with simple integral weights, 
preferably unity, which will, in conjunction, give approximations to the more complicated 
weights of a single average, it is usually easier to use the iteration process. 


29.19. In the notation of finite differences, write 


Au; — Uji = Uy, . e . ° ro) ° (29.12) 
EK U; = Ut41 = (1 + A) U; . . ° . (29.13) 
OU, = Ui44 a Ujz_4s e e . ° . (29.14) 


We have, for the second “central” difference 6x, 


67 Uy = (Upp — Uy) — (Uy — U1) 
il: ET a An es at) 


Writing 
E = exp (2i¢) . : | : : . (29.16) 
we find, symbolically, 
6§=E—24E7 
= exp (27d) + exp (— 276) — 2 
ae a | 28) 7) 


376 TIME-SERIES 


nr m 


Then >) w= Dd) Gti) 
t=-—m t=—-m 
= ‘ + 2 My (cos 2j¢) 7 iy, 
j=1 
since the terms in sin 27¢ vanish, 
sin (2m + 1)¢, 
ee . e ° (29.18) 
Thus 
ees lsin k¢ 
pidge saree 
— 1 k(k? —1).. k (k® — 1?) (k? — 3?) 
=i eae sin? 6 + = sintd — ... + Us 
p= k? — 1) (k2 — 32 
= ty + eT Pate _ See es . (29.19) 


This interesting formula gives the arithmetic average in terms of the middle term u, and 
its central differences. 

If now our series is approximately represented by a cubic, so that fourth differences 
vanish, we have 


1 Tk]. =u +" 


and this equation will in any case be true up to re differences. Similarly, for two iterated 
averages we have, to the same order, 


ag Weal hal te = te + 5 1 +1 —2)8% . .  . (29.21) 
ak ee) 24 . 


1 gay, rr) | 0) 


and soon. We will use these results to derive two formulae in very general use by actuaries 
for “ graduating ” a series, a process which is very similar to that of fitting a trend-line. 


Example 29.2. Spencer’s 15-point Formula 
Consider three successive averages with equal weights 


5 Ul Cite 5 (48 744 eee ene 


9 
— Uo +4 7 62 Uo- 
We then have, to third differences 
— 1 2 9 2 
ts = gg APES ( 1 — 20" ) 
Substituting for 6? the formula [1, — 2, 1], as given by (29.15), we find 


Uy = a5 [41 [51[— 9 299 9), 


Now without affecting the order of the approximation we may add factors in 54 or higher 
central differences, and can simplify the numerical coefficients to some extent. eh us 


MOVING AVERAGES . 377 
add to the factor [— 9, 22, — 9] a term — 364 = [— 3, 12, — 18, 12, — 3]. The result 
is [— 3, 3, 4, 3, — 3], giving 


My = 355 AP ES][— 8, 3, 4» 


> 
This is Spencer’s 15-point formula. It covers sets of 15 consecutive terms, the weights 
in full being 


Pa 3 — Sel, 46, 67, 74,2. .] 


Example 29.3. Spencer’s 21-point Formula 
In a similar way we find 


——_ [5]2 [7] = 1 + 462, 


ire 
giving, to third differences, 
Uo = rs [5]* [7] (1 — 46?) uo 
— -— 2 ao 9 re . 
= [5]° (711 » — 4] Uo 


We now add to the factor [— 4, 9, — 4] the expression 
ahs, 12, — 18,12, — 3) + — 4. 3, — 7k, 10, — 74, 3, — a 


giving @ 


= 5, BF ITI- 1, Gene 2, sie 1: 


This is Spencer’s 21-point formula. 


29.20. A few practical points arising in the application of the foregoing formulae 
are worth mentioning. 

(a) The order in which the iterations are carried out is of course immaterial, as the 
reader can easily verify. It is therefore more convenient, as a rule, to carry out the more 
complicated operations first, while the numbers being handled remain small. For instance, 
in applying the Spencer 15-point formula we should carry out the moving average 
[— 3, 3, 4, 3, — 3] first, then apply the simple average 4 [5], and then the two averages 
of four. This does not apply if the series is short, inasmuch as there are fewer of the final 
than of the initial operations. 

(b) The use of a moving average of extent 2k + 1 involves the absence of k terms at 
the end and k terms at the beginning of the trend-series. If the original series is short the 
loss may be serious, and this effect sometimes restricts considerably the extent of the 
average which we are able to apply. 

(c) It is possible to remedy the deficiency at the ends of the series by special formulae, 
but the values so derived have less reliability than those of the main trend-line, and on 
the whole it seems better to accept the loss of 2k terms unless trend-values for the beginning 


and end of the series are really essential. 


378 TIME-SERIES 


(d) As yet we have given no guide as to the choice of most suitable values of m and p. 
In practice we do not usually require to fit curves of degree higher than five, and often 
a cubic is sufficient, as is assumed in the Spencer formulae. There is greater elasticity in 
the choice of m, but the point mentioned in (6) above requires m to be as small as possible, 
consistent with other requirements. We shall see later in the chapter that the variate- 
difference method gives some further guide as to p, and that certain effects of trend-elimina- 
tion on random elements bear on the extent determined by m. 

(e) There is a voluminous literature on trend-fitting which appears to me out of pro- 
portion to the importance of the subject. It is not difficult to pursue inquiries on the 
above lines to the point of extreme apparent precision and great mathematical complexity, 
and perhaps such work is valuable where the series is fairly smooth and not disturbed 
seriously by sampling variation or superposed random fluctuation. But many of the 
series encountered in statistical practice will not bear the weight of great refinement in 
trend-fitting. The student will probably find that a knowledge of fitting by moving 
averages will be sufficient for all ordinary and many extra-ordinary purposes. 


The Effect of Trend-elimination on Other Components 
29.21. In Table 29.6 we have applied the Spencer 21-point formula to an artificial 
series obtained by adding a random element to a cubic. Specifically, 


1 1 
uy = (6 — 26) + = (¢ — 26)? + = ($— 26)@ +e. . — « (29.22) 


The component ¢, was taken from tables of random numbers and consists of samples from 
a population in which all integral values from 0 to 99 are equally frequent. The various 
columns of the table illustrate the process of fitting, and we may note in passing that for 
a series as short as this it is convenient to leave the more difficult summations to the last 
as there are substantially fewer of them. 

Now we know that the Spencer formula will fit a cubic exactly, so that when we sub- 
tract the trend from the original series we ought to eliminate the systematic constituent 
entirely and be left with our random component, except in so far as we have rounded off the 
systematic element to the nearest unit. A comparison of columns (2) and (9) in Table 29.6, 
remembering that the latter includes an element 49-5 equal to the mean of the random 
component, shows that we do not do so. The reason is not far to seek. The moving 
average has acted on the random element itself and determined a trend-line in it. 

The results of applying the Spencer 21-point formula to the random element €; are 
shown in column (11). We should expect that if the method were perfect the values in 
this column would be 49-5, the mean of ¢,, apart from irregular sampling effects; but 
not only do the observed values deviate from this mean, they do so systematically, the 
values having a small oscillatory movement which is shown as part of the trend. 


29.22. This effect can assume considerable importance, particularly if we are elimina- 
ting trend so as to concentrate attention on oscillations. We proceed to examine it more 
closely. 

Suppose that we have a series composed of the sum of three parts, a trend ¢, (t), an 
oscillatory term ¢, (¢), and a random element 4; (¢), so that 


mM=dtdtde . «© «© «©.  . (29.23) 


EFFECT OF TREND-ELIMINATION 


TABLE 


29.6 


379 


Series given by Equation (29.22) with Trend-Line determined by a Spencer 21-point Formula. 


(1) (2) (3) (4) (5) (6) (7) 
Cubic 

t . | Term. & Ue [5] we | [5] (5). | [7] (6). | 

1 —119 23 —96 

2 —105 15 —90 Sete 

3 — 92 75 117 — 246 5 

4 — 80 48 =3y — 209 Ae 

5 — 70 59 —l1 — 87 | —572 

6 — 60 1 —59 — 42 | —241 

7 | —51| 83 32 12| 162| ... | 

8 — 44 72 28 85 413 25233 

9 — 37 ' 59 ae, 194 670 3,801 
10 — 3L 93 62 164 844 5,120 
it — 26 76 50 215 957 5,984 
12 — 22 24 2 186 996 6,642 
1133 — 18 97 79 198 1,078 7,041 
14 — 15 8 — 7 233 | 1,026 7,145 
15 — 12 86 74 246 | 1,071 7,038 
16 — 10 95 85 163 | 1,069 6,934 
iby — 8 23 15 231 948 6,709 
18 — 7 3 —4 196 850 6,535 
19 — 6 67 61 112 892 6,408 | 
20 — § 44 39 148 853 6,363 
21 — 4 5 1 205 852 6,446 
22 = 3 54 51 192 944 | 6,611 
23 = » 55 53 195 1,024 6,769 
24 = & 50 48 204 | 1,031 7,052 
25 — tL 43 42 228 1,015 7,353 
26 0 10 10 212 | 1,050 7,610 
27 1 74 75 176 1,136 7,923 
28 2 35 37 230 1,153 8,249 
29 4 8 12 290 1,201 8,607 
30 6 90 96 245 | 1,337 | 9,019 
31 9 61 70 260 1,357 9,424 
32 12 18 30 BA | aie) 9,870 
ag 15 337 52 250 1,462 | 10,429 
34 20 44 64 306 1,541 | 10,989 
35 24 10 34 334 | 1,599 | 11,679 
36 30 96 126 339 1,760 | 12,539 
37 36 22 53 370 1,897 | 13,529 
38 44 183 57 411 2,047 | 14,699 
39 52 43 95 443 | 2,233 | 16,060 
40 61 14 75 484 | 2,452 | 17,570 
4] mak 87 158 §25 | 2,711 | 19,353 
42 $3 16 99 589 | 2,960 | 21,394 
43 95 3 98 670 | 3,270 | 23,690 | 
44 109 50 159 692 | 3,680 | 26,255 | 
45 124 32 156 794 | 4,088 ore 
46 140 40 180 935 | 4,529 
47 158 43 201 997 | 5,017 | 
48 177 62 239 Hey ete 
49 198 23 221 1,180 
50 220 50 270 ae 
51 244 5 249 


(8) (9) (10) (11) 
[— 1, 0, 1, Deviation | Graduation 
Z ] (7), shy (8). | ue — (9). | of & alone. 

14,352 41 9 67 
15,470 44 —42 66 
15,815 45 34 63 
15,676 45 — 52 60 
14,978 43 31 55 
14,166 40 45 51 
13,379 38 —23 47 
12,703 36 —40 43 
12,169 35 26 40 
12,102 35 4 39 
12,279 35 —34 39 
12,676 36 15 39 
13,228 38 15 40 
13,857 40 8 41 
14,508 41 I 42 
15,120 43 —33 43 
15,634 45 30 44 
16,251 46 — 9 44 
17,002 49 —37 45 
ele 51 45 44 
18,499 53 7 44 
19,307 55 —25 43 
20,159 58 — 6 42 
21,133 60 4 41 
22,417 64 —30 39 
23,797 68 58 38 
25,737 74. —16 37 
27,955 80 —23 36 
30,456 87 8 35 
33,334 95 —20 34 
36,716 105 53 34 


380 TIME-SERIES 


If we determine the trend by a moving average, denoted by an operation 7’, then clearly 

Tu; = T'4, + Tos = The Py . ° e (29.24) 
Let us now suppose that our method of determining trend is perfect in the sense that 
T¢, = ¢,. Then, on subtracting (29.24) from (29.23) to eliminate trend, we find 


u, — Tuy = (b2 — T$2) + (os — T $s). -  . (os. 

The point of present interest is that the terms 7'¢, and 7'¢,; in (29.25) may distort 

the genuinely oscillatory parts of the residual series and induce spurious oscillatory move- 
ments, 


29.23. Consider the simple case when ¢, is a sine term, sin (« + At), t being integral. 
Since 
. sin ZkA 


> sin We eed) sin 4A 
2 


t=1 
a simple moving average of k consecutive terms will result in a sine series of the same 
period and phase as the original, but with the amplitude reduced by the factor 
1 sin $kA 
k sin 44° 

Iteration q times will reduce the amplitude by the gth power of this factor. 

Thus the term 7'¢, will be small if k& is large, q is large, or if 4kA is a multiple of z, 
that is, if the extent of the moving average is a period of the oscillation. But if A is small 
and kA is small the amplitude is reduced very little and ¢, — 7'¢, will largely disappear, 
i.e. the moving average will partially obliterate the term.in ¢,. In this case, kA being 
small, the extent of the moving average is small compared with the period of the harmonic 
term, that is to say the oscillation is a slow one. This result is what we should expect. 
A slow oscillation is treated as a trend by the moving average and eliminated accordingly. 
Generally, the moving average will emphasise the shorter oscillations at the expense of the 
longer ones. Furthermore, if the extent of the average is slightly greater than the period, 
the term (29.27) may have a negative sign, and consequently the difference from the trend 
may somewhat exaggerate the true oscillations. 

It is not so easy to exhibit the precise effect of the moving average when the weights 
are unequal and the terms are not harmonic, but evidently the same kind of situation is 
apt to arise. 


sin fa +4(k+ 1) A}, . : . (29.26) 


. (29.27) 


29.24. Now consider the effect of a simple moving average (that is, one with equal 
weights) on the residual element ¢, which we will suppose to be a random element €, with 
variance v. For the term 7'¢, we have 

l [3%] 
Tbs = 5 o Sey 9 =» | DE eo oy (ono 

—[tk] 
where [3k] is the greatest integer which does not exceed 44. Consecutive values of &; are 
independent, but consecutive values of 74, are not; for 7'd; (a) and T 3 (b) have 
k — (a — 6) values of ¢ in common and are correlated if a —b <k. Thus the series Ts 
will be much smoother than ¢,, and if we proceed to further averagings will become smoother 


still... We have had an example of this effect in Table 29.6, and shall meet further 
examples below. 


EFFECT OF TREND-ELIMINATION 381 


29.25. The effect of taking a moving average of a random series will then be to 
generate an oscillatory series, provided that the weights are such as to give a positive 
correlation between successive members of the generated series, a condition which is always 
realised in moving averages employed for trend-fitting. We shall call this the Slutzky— 
Yule effect, after the two statisticians who (independently) studied it in detail. 

The generated series is not regular in the cyclical sense, that is to say its peaks and 
troughs do not recur at equal intervals of time, and the amplitudes of the oscillations vary 
considerably. Nevertheless such oscillations present a striking resemblance to the kind 
of movement which is found in practice, particularly in economic time-series, and we shall 
consider them in more detail in Chapter 30. For our present purposes we require to con- 
sider how far the process of trend-elimination itself may generate such effects in order 
to be sure that oscillatory movements in a trend-free series have not been put there, so 
to speak, by our own arithmetical processes. 


29.26. For this purpose we shall consider the period and variance of a series gen- 
erated by the Slutzky—Yule effect. 

Since the peaks and troughs do not recur at equal intervals there is no quantity which 
we can conveniently call the length of the oscillation. There will, in fact, be a distribution 
of lengths. We may define as the meau length either the mean period from peak to peak, 
or that from trough to trough ; but this raises some difficulties as to whether we are pre- 
pared to admit as periods small ripples on the main undulation. 

Recognising its somewhat arbitrary character, we shall take as our measure of oscilla- 
tory length the mean distance between “ upcrosses ’’, that is to say the mean distance 
between points where the series changes sign from negative to positive or ‘ crosses the 
x-axis’. Suppose the series is generated by a moving average with weights a, ... a, 
of a random variable which is normally distributed with variance v. Then the probability 
that 


Uz = a; Ej < 0 . * e ° e (29.29) 


and Unsy = 5 a; E41 => 0, ° ° . ° e (29.30) 


i.e. that the generated series changes sign from negative to positive, is the proportional 
frequency of 


k+1 
] 1 
dF = ayer” exp - 3, Dy a| de, ... déps, : - (29.31) 


between the hyperplanes Sas = 0 and Se a; &4, = 9. This is equal to the angle 
j= 
between these two planes, Bihich is given by 
k-1 
C7 O51 
cos@= >,  , ewe (29,82) 
a 
j=1 


Hence the mean distance between upcrosses is 27/0, where 6 is given by (29.32). 


382 TIME-SERIES 


29.27. In a similar way, the probability that . 
bpp ee re 22). )) 
Uy — Up-1 > 9, : ; ; : . (29.34) 


that is that wu, is a peak of the series, is the angle between the two hyperplanes 


is k 
> % e541 — >, % &j = 0 . . ° . ° (29.35) 


j=l j=1 

k k 

244 — D> % =O | eee 
j=1 j=l 


and is given by 
Ra a an |< | 
“+ (dy — Ap—1) (Qe-1 — M-2) — % (Ge — M-1 
«4 (29535 
Pelle — (yan ee ( 
Thus the mean distance between peaks is 27/0,. The same formula obviously applies to 
mean distance between troughs. 


cos 6, = 


> 


29.28. If we wish to exclude “ripples” of a certain length d from consideration 

we may inquire for the probability that (29.35) and (29.36) are satisfied in conjunction with 
Uy = Ursa. . . . . . (29.38) 

This is evidently the area cut off on the unit sphere by the three planes (29.35), (29.36) and 


k k 
>, 4% — >, % 4a = 0. - .  « 4 (2658) 
y= j=1 


If the angles between the planes are A, B and C this area is A + B + C — 2a = ™,, say. 
The mean length between peaks, ripples excepted, is then 47/0). 


Example 29.4 


In Table 29.7 we show 480 terms of a series of random numbers which can take integral 
values from 0 to 19, together with a moving sum of fives of a moving sum of threes. 
Fig. 29.6 shows a portion of the derived series graphically. There are 474 terms of the 
smoothed series. 

The mean value of our series is 15 x 9-5 = 142-5. The number of upcrosses will be 
found from the table to be 23, the first between the 19th and 20th term of the smoothed 
series, the last between the 459th and the 460th. The mean distance between upcrosses 
is then 440/22 = 20 units. How does this compare with the mean-distance given by 
“normal” theory ? 

The weights of the graduation are [1], 2, 3, 3, 3, 2, 1] and from (29.32) we have 


(IX 2) (28) en ee) 


cos 9 = 
124+ 274 ,,,+12 
= we = 0:9189 
37 
9 = 23° 14’, 
2 360 : 
Hence the mean distance = = 15-5 units. 


23°233 


TABLE 29.7 
Series of 480 Terms of a Rectangular Random Series ¢ and a [5] [3] smoothing S. 


| 

elel S |ieley S|) @ | ell Sil? ley ee Waite ls ieadls ts so  V ee) 1) a ce es 
ill 8! 49; 2) 61] 97) 8) 1124145} 18|136}193 | 16 | 147]241| 1] 99} 289 | 16 | 1889337 | 13 | 107 | 385 | 18 | 143 | 433 | 14/172 
2/15 50) 3) 71] 98! 1/1084146)18|140}194| 6/144]242/17| 80]200| 2/180]338| 6|134]386| 5| 1669434 | 14] 155 
8/15 51/10) 84] 99) 5/123]147| 6|131}195 | 10|132]243} 0| 754291 | 10| 158] 339 | 15 | 151] 387/17|170] 435! 9/133 
4) 8) 164152) 5| 914100/13|131]148| 0|/1214196| 5/1284244) 3| 73]292]12|1471340| 7|161]388/12/1794436/ 8/107 
5}19}147153|10| 924101)11/150f149| 4{/120f197| 7|1224245| 0| 94]293|15| 1454341 |13|160]389| 7|179}437| 3| 75 
6) 1/143]54) 3) 101] 102 | 14 | 151}150/ 11! 137]198| 8|126)246| 6|124}204| 5|148}342|13|1621390/14|184}438| 1] 53 
7) 3) 145955) 2/119}103| 6 | 140] 151 | 15 | 162}199| 18 | 120] 247|17| 169} 295| 6 | 1451343| 5/155]301/15/190] 439] 3] 55 
8) 12/165 [56 | 11 | 141} 104 | 13 | 120] 152 | 15 | 1791200} 0| 121] 248 | 16 | 195} 296 | 15 | 134] 344 | 15) 153] 392/11|194}440| 1| 72 
9) 19/175 | 57 | 14 | 1669105 | 1/1194153|11|188]201| 7/105] 249/17 | 2014297! 6 | 1371345 | 10 | 1621393 |13 | 2011441! 5| 91 
10/13) 196 [58 }18|190}106| 4/120}154| 9|184}202| 9| 99}250|15 | 191] 298 |13| 1364346] 3|174}304/18|199]442/16| 96 
11) 16) 1914159) 8 | 212] 107 | 13 | 133] 155 | 12| 183] 203| 6| 934251! 3/175]299| 2] 1364347|18/176]395| 6|193] 443] 8| 91 
12| 4/178] 60) 14 | 211} 108 | 13 | 147] 156|18|175}204] 3| 959252) 14 | 150} 300 | 14 | 129] 348 | 19| 177] 396/19| 1784444] 2) 78 
18/17) 159] 61) 15 | 204}109| 8/1721157!| 7|174]205/12| 911253) 9/144]301! 9|128]349| 8}1744397/13|173]445| 0| 75 
14) 8/150] 62/17 | 191} 110 | 12| 186 }158 | 15|160}206| 4] 93}254/11|131]302| 411151350] 5/173§398| 11178] 446| 2| 85 
15] _6)134763) 7) 1859111) 12|195/159| 5) 157]207| 2| 974255| 3|135]303 | 14} 100] 351 | 16 | 159 F399 |13 1178) 447| 7/109 
16/15 / 118} 64! 9 | 166] 112 | 19 | 206] 160 | 11 | 141 J208 | 11! 979256 )15)125]304] 2) 934352] 7/157] 400] 18 | 183] 448] 17 | 124 
17| 3) 101] 65 |11 | 160} 113 | 13 | 203}161| 9}140}209| 6| 1071257) 1/138}305! o| 964353|16|1574 401/17 | 191 | 449 | 12 | 124 
18] 3) 88} 66|14 | 167} 114 | 11 | 184} 162 | 14] 122}210| 7/115) 258/14|142}306| 8|109]354| 8/169} 402! 3|205}450] 5/117 
19} 7| 87167) 5| 1889115118 | 1561163 117]211} 6|128]259| 9} 162]307|12)131]355| 6/|168]403/18/198]451} 2/106 
20| 4/100}68/17|203}116| 2|135]164/11| 94] 212] 15 | 125] 260] 13 | 166] 308 | 10 | 159 | 356 | 15 | 165} 404 |14| 1924452] 2] 97 
21} 5/126]69/18 | 204]117| 4/121]165|) 1} 984213] 4/130] 261|11 | 182] 309 | 11] 1789357 | 19 | 153] 405 | 14 | 191 f 453/15] 92 
22/14) 140) 70] 13 | 2054118 | 10} 111}166| 8] 934214} 13 | 1269 262 | 15| 1901310] 17| 1874358) 4/150] 406/13 | 197} 454| 8] 100 
23/15 |147171)18 | 185/119] 8|116]167| 2/106]215; 4/125}263| 8| 2031311] 10|200]359| 5|133]407| 5/205)455| 2/111 
24/10/1509 72) 0 | 171}120)| 210 | 131] 168|18| 103,216) 7|123}264/17 | 210} 312 | 12| 2121360] 9|/120)408|19] 204,456] 4/120 
25 | 3/153173/19| 1491121! 3|1451169| 1] 1211217 | 13) 119] 265 | 19 | 214) 313 | 17 | 216 | 361 | 12 | 117 4.409 | 37 | 202] 457 | 11 | 121 
26/10 /156474) 1) 146]122 16 | 156]170! 7) 117} 218| 4) 1114} 266/10 | 211] 314 | 17} 211] 362] 2|1274410|18| 1929 458 | 15/119 
27 | 13 | 165 | 75 | 14 | 1301123 | 12) 1734171! 9| 127) 219] 13 | 101 | 267 | 17 | 188] 315 | 16 | 192} 363 | 11|118}411| 5/1744 459| 8/110 
28 | 14\175 76 | 12 | 135}124| 8|175]172|13|120}220| 0] 914268/11/163]316| 9|/1731364/12/112)412| 7/140] 460| 3] 98 
29/16 /168}77| 2) 139}125 | 19/160]173| 2/137]221| 3] s2}269) 9|146}317| 9|151}365| 5| 105) 413/15 | 107} 461] 1| 98 
30] 8) 160978] 7 | 160} 126 | 11 | 1454174 | 16/139}222/11| 761270| 1/1534318] 2|1524366! 0|100)414| 1] s6f4e2| 4) 121 
31/10/1549 79 | 16 | 175}127| 1/129}175| 1/145]223| 0| 759 271/111 1544319 |16|160)367(12| 849415| 4] 66] 463/13 | 150 
2 1) 156980 15 Iss8f128| 4) 1930176 | 17| 142] 224] 10! 72] 272/17 | 162] 320) 10|1854363| 6] ss}4ie| 2| 584464|17|170 
33 | 18 | 15481 | 17 | 197} 129 | 16 | 108] 177 | 13 | 145]}225| 1) 986)273|17|155]321117|196}369) 2| 964417| 3| 50]465/19/176 
34/17 | 165182} 11 | 200]130| 3)115]178! 0|149]226| 4] 92}274| 4/|154]322/15|209]370| 4/104}418] 2| 62]466] 5/169 
85] 4/164]83} 6 | 206}131 | 13 | 108}179| 15 | 157]227| 6|1094275| 8| 1374323 | 13 | 1941371 | 15 | 109] 419|10| 781467) 4/149 
36 | 10 | 159] 84/17 | 2154132) 0|118}180| 7| 166] 228/18|116]276) 2|134]324|10/1791372| 6|1301420| 0/105] 468 | 15/136 
37 | 16 | 138 | 85 | 18 | 2284133 | 10| 112} 181 | 16 | 1674229| 3| 1399277118 | 141] 325 |18|1519373| 5|148]421| 9) 1261469) 8/137 
38 | 2/137 186/19 | 230]134| 4|122]182|16|171]230| 7/1491278| 8|172]326| 0/1334374 | 14| 156] 422|16|146]470| 6/136 
39 | 13 | 131 } 87 | 15 | 220] 135 | 19 |113]183| 7|169]231|12|149}279| 9|184]327] 9/1129375/14| 164} 423| 9] 152)471/ 14 | 133 
40) 3) 140]88/13/ 1984136] 3/110]184| 6| 174} 232 | 15 | 141 | 280] 19 | 1851328} 8| 1084376 | 11| 180] 424 /11/ 1449472] 9/126 
41|14/135789|/ 8|175}137| 4/100} 185 | 13 | 168 233 | 11] 137] 281 | 17|167]329| 3/1059377| 8]1971425 |12|1241473| 0/125 
42) 7/146]90|10/159]138| 7|103}186|17|170}234| 1|134]282| 411501330) 9|1111378|15|1749426| 2/106] 474/15 | 109 
43 | 16/141] 91) 14 | 158,139] 0|106]187|14|170]235| 8|128283! 8|1234}331 | 12/ 1074379 | 18] 1511427| 3/106]475| 7/103 
44) 3/139]92| 5|158]140|16]107]188| 2/1591236] 9|130284| 5|115]332/ 3/101}380| 7]127]428| 9/119]476| 5| 96 
45 | 10 | 117 § 93 | 12 | 159 J 141 | 13 | 102] 189 | 15 | 140] 237 | 14. | 1324 285| 6|1311333| 8| 85)381| 1| 99}429| 6|139}477| 1] 95 
46/12) 96/94/18] 153]142| 0/103]190] 9| 1394238! 9|128}286| 7/1681334| 5| 779382| 2] 88)430\17| 159] 478|11 
47) O| 75)95} 1/145]143) 2|/114]101| 1|145]239| 6 | 122]287|/19) 1864335) 1| 754383) 7| 89} 431/15 |174}479| 5 
48) 3] 65/96 (14) 1247144; 4/127}192 11511517240) 7/108] 288/19 |1961336| 2} 931384] 41119]432| 5/1794480| 5 


150 


700 


Value of Series S. 
1S) 
S 


360 380 400 420 440 460 480 


Number of Term Tt. 
Fic. 29.6.--Graph of the Last 117 Terms of the Series S of Table 29.7. 
383 


384 TIME-SERIES 


The observed mean distance is 20-0 units, but this is based on rectangular variation, and 
we are, perhaps, entitled to expect some difference from normal theory. For rectangular 
random variables, values distant from the mean occur more frequently, and it is not sur- 
prising to find oscillations in the series which do not result in upcrosses. 

The number of peaks in the series will be found to be 62, the first at the seventh term, 


: . 459 : 
the last at the 466th. Hence the mean distance between peaks is a 7-5 units. From 


formula (29.37) we find 
cos §, = 2 6, = 48° 11’. 


360 
48-187 
ment. It will be observed that several of the distances between peaks are due to very 
small ripples. 

From a number of experiments Dodd (1939a) concluded that series generated from 
rectangular material conformed fairly well to normal theory. 


Thus the theoretical mean distance is = 7-5 units, ‘in good agreement with experi- 


29.29. Let us now examine how the variance of the induced oscillation compares 
with the variance of the original random series. 

The sum of & random elements with variance v has variance kv and its mean has 
variance v/k. It does not follow that a simple moving average has a variance 1/k times 
that of the random element, because of correlations between successive members in the 


derived series. If the original series was <,... ¢, the derived series is, with weights 
inca Oy 
Qa, & + Ge &, +.0- +&& =, Say 


ee eae (29.40) 


Ay En—nta Te Ep-nte bo +++ Ae = Nn—K+41 


The expected value of the sum of these values is zero since the expected value of « may be 
taken to be so. Since there are n — k + 1 terms we have for the variance 


1 


a 2 
ee os | Ore 
The expected value of this, since the «’s are independent, is 
1 5 2 
Fe ee) — (ee Sas ee ane . (29.42) 


In particular, if the a’s are all equal to 1/k, the expected value of the variance is v/k. This 
gives us the average reduction in the variance. 

If a simple average of extent k is iterated q times the weights are the successive 
coefficients in 


i 
7a t+e+u?+... 4 4-1), 


The sum of squares of these coefficients is the coefficient of #%*-) in 


1 1 — aky? 
ao oe ee + aye . 21645) 


EFFECT OF TREND-ELIMINATION 385 


and this gives the average reduced variance for a simple average of k& iterated q times. 
The following are the values of the reducing factor for some of the values of k and g :— 


q 
i 2 3 4 | 5 
3 0:33 0-23 0-19 0-17 0-15 
4 0-25 0:17 0-14 0-12 0-11 
k 5 0-20 0-14 0-11 0-10 ' 0-09 
6 0-17 0-11 0-09 0:08 0-07 
7 0-14 0-10 0-08 0-07 0-06 


Evidently the result of the first moving average is to generate a series with a much 
lower variance than that of the original random element, but the second and succeeding 
iterations do not reduce the variance further to the same extent. In the case k = 7 the 
first averaging reduces the variance to one-seventh, but the next three reduce it only by 
a further half. 


29.30. To apply such results in practice we require an estimate of the variance of 
the random element in the original series. If this is available we can estimate the variance 
of the generated series and also, from 29.26, the mean distance between upcrosses or 
between peaks. If then our residual series, after the elimination of trend, showed an oscilla- 
tory movement with this variance and these mean-distances, within sampling limits, we 
could not conclude that the oscillatory effect was real. It could have been induced by 
our method of eliminating trend. 

In the present state of knowledge it is not possible to assign permissible limits of 
sampling variation by relation to standard errors in the usual way. Whether any particular 
effect is significantly different from the values of the series generated from the random 
element remains, therefore, a matter of subjective judgment to some extent. The sampling 
problems involved are formidable, but there does not seem any reason why they should 
not be capable of explicit solution. This field of study awaits the attention of the theorist. 


Example 29.5 


For the data of Table 29.3 (sheep population of England and Wales) trend was elimi- 
nated by a simple average of nines, the resulting residuals being shown in Table 29.8. 
A glance at the series suggests some sort of oscillatory effect, since the signs of terms cluster 
together. By the methods of the next chapter the effect may be brought into greater 
prominence. The data themselves, however, indicate a mean-distance between upcrosses 
of about 8 or 9 years, and actual calculation gives a variance of 8474. Can this be due 
to the operation of our trend-elimination on a random element in the original series ? 

For the mean distance between upcrosses due to a simple nine-point average we have 


cong — > 6 = 27° 16’, 


and the mean distance is =- = 13-2 approximately. This is considerably in excess of 


our observed value, but not sufficiently so to reject outright the possibility we are examining. 

Since, however, the variance of residuals is 8474 this must, to have been generated 

from a random series by a simple average of nines, derive from a random element with 
A.S.—VOL, IL. CG 


386 TIME-SERIES 
TABLE 29.8 


Residual Values of the Sheep Series of Table 29.3 after Elimination of Trend by a Simple 
Nine-Point Moving Average. 


Residual z | Residual Residual 
ge: (10,000). | *°™ | (10,000). ea? (10,000). 
1871 — 176 1893 | + 34 1915 + 19 
72 — 112 94 | —103 16 + 128 
73 + 50 95 — 104 17 + 97 
74 + 141 96 — 15 18 + 69 
75 + 60 97 — 23 19 — 
76 — 20 98 + 17 20 = 174 
a + 12 99 + 71 21 — 107 
78 + 82 1900 + 35 22 ey 
79 + 130 01 + 16 23 — 109 
80 = 8 02 — 27 24 — 23 
81 — 166 03 — 32 25 + 60 
82 — 179 04 — 49 26 4+ 121 
83 a 05 ~— 61 27 + 94 
84 + 38 06 — 62 28 — 25 
85 + 97 07 — 24 29 — 90 
86 + 8 08 + 68 30 — 
87 ~ 6 09 +141 31 + 72 
88 — 105 10 + 119 32 + 152 
89 | — 99 11 + 66 33 + 112 
90. | + 35 12 — 52 34 — 64 
91 + 159 13 = 35 — 87 
92 + 167 14 — 61 
nd 


variance 76,266. An estimate of the variance of the random element in the original series, 
obtained by the variate-difference method which we describe below, was only 350 approxi- 
mately. Making every allowance for sampling effects, we cannot do otherwise than reject 
decisively the possibility that the residual oscillation is spurious in the sense of having 
been induced into the data by the effect of the elimination of trend on a random element. 


29.31. We may summarise the foregoing discussion of trend-elimination as follows :— 

(a) The conception of a trend as a “ smooth” or ‘. regular’ movement is equivalent 
to the supposition that trend can be represented, at least locally, by a smooth mathematical 
function and in particular by a polynomial in the time-variable. 

(6) Certain series can be treated on lines formally equivalent to regression analysis ; 
but a more generally applicable procedure is to represent the trend by a moving 
parabolic arc. 

(c) The moving arc of best fit in the least-squares sense gives values which are deriv- 
able from a moving average of the data. The weights of this average are to some extent 
at choice, according to the extent of the average and the closeness of fit required in the 
moving arc. 

(d) A moving average of extent k sacrifices (k — 1) terms, in the sense that the derived 
series is (& — 1) terms shorter than the original series. If the series is short it is usually 
desirable to keep this loss to a minimum, that is, to keep the extent of the average as 
short as possible. 


¢ 


THE VARIATE-DIFFERENCE METHOD 387 


(ec) A moving average may distort genuine oscillatory effects, in general exaggerating 
the shorter variations at the expense of the longer ones, and may induce spurious oscillatory 
phenomena by its action on random residuals. For harmonic components the effect is 
minimised by taking the average as simple, with extent equal to the period of the com- 
ponent. For random components the effect is minimised by making the sum of squares 
of weights in the average a minimum, ie. by using a simple average. 


29.32. In the theory of time-series there are very few rules which can be laid down 
without a good deal of proviso and caveat. It will be evident from the foregoing that there 
is no golden rule in trend-fitting which can be applied irrespective of individual circum- 
stances. If we desire to get a close fit to the data we must use a parabola of fairly high 
order, which involves a moving average with weights which are far from equal. This, 
however, increases the danger of obscuring the true oscillations in the residuals. In 
most practical cases it is necessary to strike a balance between conflicting requirements 
by intuitive judgment as to the appropriate moving average to use. 


The Variate-difference Method 

29.33. We now proceed to consider the random constituent of a time-series. From 
the very nature of random variation we cannot expect to derive any formula, however 
approximate, which will measure the random component directly at any given point of 
the series. The best we can hope to do is to determine the non-random components and 
to obtain a random residual which is left unaccounted for by those components ; and even 
this, as we shall! see in the next chapter, is not a very strong hope when oscillations appear 
in the series. 

On certain assumptions, however, we may determine the variance of the random 
component and hence obtain a general idea of its magnitude and importance. Suppose 
that the systematic part of the series can be represented, at least locally, by a polynomial. 
Then successive differencing of the series will gradually eliminate the polynomial element 
but will not reduce the random element correspondingly. As we proceed with the differ- 
encing, the random element becomes more and more predominant until finally the syste- 
matic component is negligible. Hence we can determine effectively the variance of the 
random component in the differenced series, and by a simple calculation derive an estimate 
of that in the original series. 


29.34. Consider the differencing of a random series ¢,. We have 
A ey Seay) & . . . ° . . . . . (29.44) 


AY ey = b49 — (7) Ettr—1 1 4 Eppa ieee me) 4 wb) Ep (2a) 


Without loss of generality we may suppose that the mean value of ¢, is zero, and thus 
AT 6) — 0, : : 3 ~*~ .« (2936) 
Hence 


2 


= evar — ( Capea + - sgh (= re} 


) 
Yeti + er +al 
=o{1+(7) ers +1}. 


388 TIME-SERIES 


The sum in curly brackets is easily evaluated from the consideration that it is the coefficient 


2 
of 2 in (1 + 2)" (x +1)", that is, equals ( a Hence 


var (A" ,) =0(7). : 2 ic : . (29.47) 


We may then derive an estimate of v by writing 


eae er 


2r 

r 
It is to be noticed that we use the second moment about zero, not the observed variance 
of A’ ¢,, since the mean is known to be zero. This shortens the arithmetic to some extent. 


The factor ie) for r = 1 to 10 has the following values :— 


(7) fC) 


or 


1 2 0-5 
2 6 0-166,667 
3 20 0-05 
4 70 0-014,285,7 
5 252 0-023,968,25 
6 924 0-021,082,25 
7 3,432 0-03,291,375 
8 12,870 0-0477,700,1 
9 48,620 0-0420,567,7 
10 184,756 0-055,412,54 


29.35. Basing itself on equation (29.48) the method of variate-differences proceeds 
as follows: We difference the series once, find the second moment about zero of the result- 
ant and divide by 2; we then difference again and find the second moment about zero, 
dividing in this case by 6; and so on. If the successive estimates of v decrease, we con- 
tinue with the differencing. There will, in general, come a point when they cease decreasing 
and remain constant within sampling limits (which may be rather wide). At this stage 
we may suppose that we have eliminated the systematic element in the original series. 
The final estimate gives us an estimate of the variance of the random element in the original 
series, and the order of the difference to which we have had to go will give an indication 
of the degree of the polynomial representing the systematic component. 


Example 29.6 


Let us apply the variate-difference technique to the series of Table 29.6. We know 
from the method of constructing the series that the systematic part ought to be completely 
eliminated after the third differencing, and also that the random part consists of an element 
with variance 833 approximately. In fact, the random numbers from 1 to N have a 


variance (N? — 1)/12 and N in this case is 100. The actual variance of the random element 
in Table 29.6 is 843. 


THE VARIATE-DIFFERENCE METHOD 389 
TABLE 29.9 
Differences of the Series u, of Table 29.6. 
t | U; AS. A?. ANS, Aes ANS. Zale, 
| 
M —96 — 6 67 155 279 508 1050 
2 | —90 —73 — 88 —124 — 229 — 542 —1297 
3 —17 15 36 105 313 755 1524 
4 — 32 —21 — 69 — 208 —442 — 769 —1141 
5 —I1l 48 139 234 327 372 271 
6 —59 —91 — 95 — 93 — 45 101 361 
7 32 4 — 2 — 48 —146 — 260 — 229 
8 28 6 46 98 114 — 3l — 625 
9 22 —40 — 52 — 16 145 594 1661 
10 62 12 — 36 —161 —449 —1067 — 2252 
Il 50 48 125 288 618 1185 1978 
12 2 —77 — 163 — 330 — 567 — 793 — 876 
13 79 86 167 237 226 83 — 159 
14 — 7 —81 — 70 11 143 242 137 
15 | 74 —ll — 81 —132 — 99 105 551 
16 85 70 51 — 33 —204 — 446 — 655 
eG 15 19 84 171 242 209 — 64 
18 — 4 —65 — 87 — 71 33 273 690 
1K) 61 22 — 16 —104 — 240 — 417 — 629 
20 39 38 88 136 WE 212 216 
21 1 —50 — 48 — 41 — 35 — 4 175 
22 51 — 2 — 7 = & — 31 — 179 — 650 
23 53 5 — | 25 148 471 1110 
24 48 6 — 26 —123 —323 — 639 — 975 
25 42 32 97 200 316 336 41 
26 10 —65 —103 —116 — 20 295 925 
27 75 38 13 — 96 —315 — 630 — 965 
28 37 25 109 219 315 335 207 
29 12 — 84 —110 — 96 — 20 128 316 
30 96 26 — 14 — 76 —148 — 188 — 32 
31 70 40 62 72 40 — 156 — 798 
32 30 —22 — 10 32 196 642 1597 
33 52 —12 — 42 —164 — 446 — 955 —1719 
34 64 30 122 282 509 764 950 
35 34 —92 —160 —227 — 255 — 186 141 
36 126 68 67 28 == 1}) — 327 — 991 
37 58 it 39 97 258 664 1515 
38 57 —38 — 58 —161 — 406 — 851 — 1492 
39 95 20 103 245 445 641 707 
40 75 —83 —142 — 200 —196 — 66 281 
41 158 59 58 — 4 —130 — 347 — 685 
42 99 1 62 126 217 338 509 
43 98 —61 — 64 = Oil —121 — 171 — $14 
44 159 3 27 30 50 143 432 
45 156 — 24 —- 3 — 20 — 93 — 289 — 745 
46 180 —21 17 73 196 456 
47 201 —38 — 56 —123 — 260 
48 239 18 67 137 ec 
49 221 —49 = 70 
50 270 21 
51 249 


390 : TIME-SERIES 


Table 29.9 shows the series and the differences up to A*. For the sums of squares 
in the various columns S, corresponding to A’, we find— 


S,= 107,541 
S.= s19meé 
S, = 1,033,513 
S, = 3,445,308 
Ss = 11,720,069 
40,548,844 


SI 
| 


To obtain second moments we divide by 51 —j and then, to obtain the estimate of », 


by G i We find the following :— 


Estimate. 


1075-41 
1082-02 
1076-58 
1047-21 
1011-05 

975-20 


aornrwnre 


Curiously enough, the estimate for 7 = 2 is higher than that for 7 = 1 and there is 
little difference between the various estimates. In the ordinary way we should have 
concluded that the systematic component was adequately represented by a polynomial 
of order 1, that is to say a straight line, and that the residual random element had a variance 
of about 1000. 

The reader must not be surprised to find discrepancies of this kind between theory 
and experiment in short series; and the discrepancy is not, in fact, as big as it seems. 
The variance of the original series is 6272-61. The mean square of the first difference, 
divided by 2, is 1075-41, so that about five-sixths of the variance has been eliminated by 
the first differencing, and the method indicates, quite correctly, that the greater part of 
the systematic element is linear. The random element is rather large compared with the 
non-linear systematic terms, and the latter have got caught up in it—the series is too short 
for the variate-difference method to disentangle them. Consider, for instance, the cubic 
term ;1, (t — 26). In the original series this varies in value from — 156-25 to + 156-25. 
First differences reduce it to a (t — 26)?, varying from 18-75 through zero to 18-75, 
whereas the random element is increased in range from 0 to 198. Already the systematic 
term is being swamped by the random element, and a slight degree of accidental correlation 
between the two can easily account for the increase in the mean-square of second differences. 

The matter may be put in a slightly different way. Suppose that, relying on the 
variate-difference method, we regarded the data as represented by a linear equation plus 
a random residual. If we fitted a straight line by least squares and examined the residuals, 
we should probably find very little evidence of departure from randomness. This repre- 
sentation would differ from the mode of construction of the series, but it would be a possible 
method of construction. Only the failure of the representation to conform to further 
terms of the series would reveal its weakness. 


THE VARIATE-DIFFERENCE METHOD 391 


29.36. The variate-difference method thus provides a kind of lower limit to the 
degree of the polynomial which will represent a series locally or generally. There remains 
for consideration the question as to what sort of differences between successive estimates 
of v can be regarded as chance effects, in order to decide when the value has reached a 
stationary level. The sum of squares S; is a constant factor times the second moment, 
but as its members are correlated among themselves we cannot use the variance of the 
second moment to test its significance. Further, S; and S;,, are correlated. We proceed 
to derive the sampling variance of their difference, the somewhat complicated formulae 
being due to Anderson (1914). 


29.37. Write 


, =(*). . ee eos) 


Then we have, as in (29.42), 


1 va Coe « -) He 
e u) (tet...) = fa - z - (29.50) 
r 


where “4, is the variance of wu. Further 
E (A" u)* = EE {bo u,41 — 014, + b2U,-14 — ---» + (— 17), u,}? 
+ {by Upsg — Or Upey + b2u, — «--. +(— 176, 4} 


se 5 gh ee 
“ {bo Un — by Un—-1 = b: PE aa — Ly b, Upeppeles ° (29.51) 


Consider first of all the terms in this which result in fourth powers of wu. They will 
derive from 
Eiu, +w +... +O + Butt bia, t+... +O +... 
+ 65 Wh + Bf Una He ee + Op Une} 
= E {B5 (un + uj) + (65 + BE) (un_a + 4) + (05 + Bi + 88) (who + 03) +. 


+ (6p + Bt +. + Bea) (Un-ner + Up) + (Bp + OE + + +55) 
(ee eee, st ae). : : F : ; ~ (29.52) 
Writing now 
Be = (2) + (02 + O27... +R +O +... ORY). (29.53) 
2 
AB=(B+B+... +oy=(77). Dee (29.54) 
we see that the term in £ (u*) is 
{Aj (nm — 2r) + 2B} E (u4). ee ee 


The only other term appearing from (29.51) will be of type E (u7 u;,),1 4m. If the reader 
will write out the expansion of (29.51) he will find that the coefficients are expressible in 


terms of 
9 2 
Ay = (body +bsBr toe +656 = (7) 2. (20.80) 


and 
Bi =(b, b;)?+-(bo bj +0, 6) eee -|- (bo 6; +6, B44 + eee a Dep 4 bape . (29.57) 


392 TIME-SERIES 


The expression for H(A’ u)* reduces to— 
(n — 2r) A? E (u*) + 4 { (mn — 2r +1) A} 4+ (mn — 2 4 2) A? 4... 
+ A? (n — 2r + 1) } B (uj u2,) + 2B5 E (us) 
+8{BP4+Bi+...+587., 47 BE wa) zeae 


eee or \e : 
Substituting yw, for £ (w*) and «3 for E (uj u},), dividing by (n — r)? ( 2 and subtracting 


uz, we find the sampling variance of the estimate of v. The expression can, however, be 
simplified to some extent. Putting 
Pas) 


2 = SG Gia) 2G) G4) PAG) Gaal 


j=0 j=0 ; j=0 


( . we (29.59) 


we find, after lengthy algebraic rearrangement, 


(*) | 
Die 2r 
nes is LUE obs r ; r <4n. E é - (29.60) 


<7) /2r\? (a —7) 
saat pats 


If terms of order (n — r)~? can be neglected, this reduces to 


n—fT Qr\?2n —Pr 
r 


or, using the Stirling approximation to factorials, 
I 
ara {ug — 3uR + pe 4/(2rz) }, P : : . (29.62) 
which is a fair approximation to (29.61), being within 3 per cent. for 7 as low as 6. 
When the population of values of w is normal, “, — 3u3 vanishes and the formula 
simplifies accordingly. 


ee, 
— 2 A 
Ms BITE 2r 22 : . ; ee ail) 


29.38. In a similar way it may be shown that 
Sr ; Sp44 7 
cov m—n (7) (a 
VF r+1 
2a 
= Le) ey, 2 
n—r?r ( )( )@—r—1) 
Va rt] 


(” + | 
2 
a 2us 2r 2n — 27 — i _ 7 aa 1 : (29.63) 


n—?T (7) 2r+2\ n—r—l 2(n —r —1) 
ee r+ 


THE VARIATE-DIFFERENCE METHOD 393 


where 
Tat r—2 
: Eve re - 1\* reve a a 
l= ; : 2 : : On Marist s : 
i (5) ee) &(5) ee os a TY Opals 
From (29.60) and (29.63) we can determine the variance of the difference of 
S S 
one SG ee — em 
Dy 2r +2 
m— (77) (n—r ie) 


The general formula is complicated, but for normal variation, large n and r > 6 we have, 
analogously to (29.62), 


Cee Spt 
=f an(®) worn) 


S 2 
_— (3r +.1)/(2ar) r 
— 2 (ar + lee aoa(™h . (29.64) 


r 


The arithmetic application of the formulae has been facilitated by the preparation of tables 
of the constants involved. Reference may be made to Tintner (1940) who gives tables 
prepared by himself, Anderson and Zaycoff. 


Example 29.7 


For the data of Table 29.3 (sheep population) an application of the variate-difference 
method up to the tenth difference gave the following results :— 


s,/( ) es) 


r 
1 3468 
2 1442 
3 854 
4 629 
5 518 
6 448 
7 401 
8 371 
9 357 

10 347 


The values here are falling steadily from r = 1 to r = 10, but very slightly towards 
the end. From (29.64) for r = 6 we have for the variance of the difference, 80-7 approxi- 
mately and for r = 10, 25-8 approximately. It appears that the reduction in variance 
at r = 10 is losing significance, and that a moving arc of degree 10 would be sufficient to 
eliminate the systematic component. It does not, of course, follow that the trend-line 
must be of this degree, for we may not want to eliminate the oscillatory movements in 


the trend-line. 


29.39. The variate-difference method will clearly not eliminate systematic effects 
such as periodic terms with very short period. Consider, for instance, the series 1, — 1, 
1, —1, ete. The first differences give us a series 2, — 2, 2, — 2, etc., second differences 


394 TIME-SERIES 


4, — 4,4, —4,etc., and soon. The variance of the series of rth differences is, neglecting 
effects due to the shortness of the series, 2°" times that of the original, and the quotient 


2 
when this is divided by ( ,) tends to 


92r (r !)2 
—_ —> Var 
(2r !) 
and so increases without limit. In such a case we cannot obtain an estimate of the variance 


of any random element which may be present. 


NOTES AND REFERENCES 


References to the fitting of polynomials are given at the end of Chapter 22. For the 
moving average see Whittaker and Robinson’s Calculus of Observations and the books by 
Macaulay (1931) and Sasuly (1934). 

Attempts have been made to use trend-lines for purposes of forecasting, and even to 
measure the standard error of a forecast—see Schultz (1930) and a discussion in Davis 
(1941). The methods proposed appear to me theoretically unsound and in practice they 
lead as a rule to such wide limits of error as to be of doubtful value ; but this is a personal 
opinion and the less sceptical reader may care to consult Davis’s book and to follow up 
the references given therein. 

For the effect of moving averages on random variables see Yule (1921) and Slutzky 
(19375), the latter being an English version of a paper published in Russian many years 
earlier. See also Dodd (1939a, 1941a). Slutzky proves an interesting theorem—the 
theorem of the sinusoidal limit—to the effect that repeated moving averages of certain 
kinds applied to random series generate a sine-curve. 

For the variate-difference method see the book by Tintner (1940), a very thorough 
practical account with useful tables. The more important earlier memoirs are those by 
Anderson (1914, 1923, 1926), “Student ’’ (1914), Morant (1921), and K. Pearson and 
Cave (1914). 


EXERCISES 


29.1. Show that in the formulae of equation (29.7) and similar formulae of higher 
orders the sum of the weights is unity. 


29.2. By evaluating the solutions of (29.5) determinantally show that a parabolic 
curve of second or third order giving a graduation 
Oy Uz + Gg) Ug_-1) + + + $AU +... +4; %,; 
has 
_ 3n* + (38n — 1) — 57? 
? (2n — 1) (2m + 1) (2n 4+ 3) 


29.3. Show that the weights in the Spencer 21-point formula are 


1 
Sf). 2.3 aa 
350 | A 3, 5, 5, 2,6; 18, 33, 47,51, 6030 al 


and that if it is applied to a random series the variance of the resultant is about one-seventh 


EXERCISES 395 


of the original series—about the same reduction as would be given by a simple moving 
average of sevens. 


29.4. Show that Macaulay’s 43-point formula, 
1 7 
eis | 79 110.0, 0, 0, 0, 0, 1,... 1 
ago H2108) 6 | 3G ae. | 
has weights 
1 

came AO 4ge ees, — 60, —= 122, 178, 206, — 190, — 127, 

5g05 [> 18; 80, 0 
— 6, 163, 360, 562, 760, 928, 1050, 1127, 1156, . . .] 


and that it reduces the variance of a random series about as much as a simple average 
of nines. 


29.5. Take a random series of, say, 200 terms and determine “ trends’ by moving 
1 1 1 ; 
— [9], a [9]? and 799 191°. Compare the mean distances between peaks and 


9 
upcrosses with the theoretical values based on normal theory. 


averages 


29.6. If ¢,is a random series, show that the correlation between successive members 


of A* ¢, for long series is — and hence tends to — 1 as & increases. Hence show 


k 
k+1 
that the signs of successive terms in A* u, tend to alternate, where w, is the sum of a random 
element and a systematic element representable by a polynomial; and verify by reference 


to Table 29.9. 
29.7. By eliminating 6? from (29.19) show that, for a cubic curve, an accurate trend- 


line is given by 
1 ha al hk? —1 
a(t 


h? — k? h 
and generalise this result. 


(Cf. J. A. Higham, J. Inst. Act. (1882-5), 23, 335; 25, 15, 245.) 


CHAPTER 30 
TIME-SERIES—(2) 


30.1. The present chapter is devoted to a discussion of oscillatory effects in time- 
series. We shall suppose that our series is stationary, i.e. has no trend, either because the 
original data contained none or because trend has been removed by one of the methods 
described in the last chapter. Our typical series will then fluctuate round some constant 
value which we may usually, without loss of generality, take to be zero. We shall assume 
that there is a prior possibility that part of the variation at least is random. This, indeed, 


TABLE 30.1 


Trend-free Wheat-Price Index (European Prices) compiled by Sir William Beveridge for 
the Years 1500-1869. 


(From Beveridge, 1921.) 


Index 
Year 
Index 
Year. 
Index 
Year 
Index, 
Year 
Index 
Year 
Index 
Year 

| Index 


OSCILLATION AND CYCLE > 397 


is necessary if our results are to have any practical application, for most of the series 
encountered in practice have some element of irregularity, however small. 


30.2. Four examples of the type of series under consideration have already occurred. 
The table of Example 21.11 (page 126) gives the deviations from a simple nine-year moving 
average of the yields of potatoes in tenths of tons per acre in England and Wales for the 
years 1888-1935. Table 29.1 (Fig. 29.1) gives the annual yields of barley in cwts. per 
acre in England and Wales for 1884-1939, no nine-year elimination of trend having been 
carried out in this case. Table 29.4 (Fig. 29.4) gives rainfall data at London over the 
century 1813-1912. Table 29.5 (Fig. 29.5) gives egg-production per laying hen in the 
U.S.A. 


TABLE 30.2 


Marriage Rate in England and Wales: Deviation from a Simple 11-Year Moving Average 
for the Years 1843-1896. 


Units 1 in 10,000. 


{ 
eee” |e | OE | vou. =| Matisee 
1843 — 6 1861 5 1879 = 1 
44 1 62 = 80 — 6 
45 12 63 1 81 0 
46 10 64 6 82 5 
47 = 65 8 83 7 
48 —8 66 9 84 3 
49 | — 6 67 = & 85 — 4 
50 3 68 = 86 a= 
51 4 69 = NY 87 = 
52 7 70 = 7 88 — 5 
53 11 (a 0 89 1 
54 3 72 8 90 6 
55 =e 73 12 91 6 
56 = 74 u 92 2 
57 =e 75 5 93 = © 
58 = 7 76 4 94 = 
59 3 77 — 3 95 — 6 
60 4 78 ff) 96 il 


| 
| 
| 
| 
| 
| 


Tables 30.1 and 30.2 give two further examples. The first is a famous series of trend- 
free wheat-price indices compiled by Sir William Beveridge and extending over 370 years, 
a phenomenal length of time for economic series. The second is the deviation from a 
simple 11-year moving average of marriage rates for the years 1843-1896. 


Oscillation and Cycle 

30.3. We will now attempt to define more closely the sense in which we use the 
words “oscillation”? and “cycle”. It is particularly important to exercise great care in 
the use of an accurate nomenclature because a great deal of the literature on this subject 
suffers from confusion due to loose wording. 


398 TIME-SERIES 


By a cyclical component of a time-series we shall mean one which is a strictly periodic 
function of the time, that is to say, for which there exists a period w such that 


Us = Uto = W420 = ++ + = Uke = ++. . . (30.1) 
whatever the value of ¢. The periodic functions which we shall cone eas in particular are 
the sine and cosine functions. If the series can be represented as the sum of a cyclical 
component and a random constituent, or by a cyclical component alone, we may speak 
of it as a cyclical series. 


30.4. If the series is not random it must move with more or less regularity about 
the mean value, and we shall then speak of it as oscillatory. The oscillatory movement 
may be in part due to random elements but must not be entirely so. A cyclical series is 
oscillatory, but an oscillatory series is not necessarily cyclical. 

An oscillatory movement may be the sum of two or more cyclical components. Con- 
sider, for instance, the sum of two periodic terms 


u, = sin elt + sin oils 
Wy Ws 
Tf wm, and w, are commensurable there will be numbers, and in particular a smallest number 
w, which is an exact multiple of both of them. This is clearly a period of the series. 
But if w, and w, are not commensurable there will be no period of this kind and the sum 
will be oscillatory but not cyclical. 


30.5. It may be felt by the reader that we could reasonably extend the use of the 
word ‘ cyclical’ to cover series which are the sum of cyclical terms ; but the danger of 
doing so is that within certain limits any series can be represented as a sum of harmonic 
terms, even if it is not itself oscillatory, in virtue of Fourier’s theorem. Admittedly such 
a representation, to be exact, must in general consist of an infinite series of terms and is 
valid only in a certain range, but in practice a comparatively small number of terms often 
gives quite a good approximation. We do not call a function a polynomial because it 
can be expanded in powers of the variable by Taylor’s theorem ; and correspondingly 
we shall not call it cyclical because it can be expanded as a sum of harmonic terms by 
Fourier’s theorem. On the whole it seems safer to avoid the word “ cyclical” for series 
which consist of a finite number of cyclical terms. 


30.6. For our present purposes the main significance of the distinction we are attempt- 
ing to make is that in a cyclical series the maxima and minima, apart from disturbances 
due to the superposition of a random element, occur at equal intervals of time and are 
therefore predictable for a long way into the future—for so long, in fact, as the constitution 
of the system remains unchanged. In oscillatory series, on the other hand, the distances 
from peak to peak, trough to trough or upcross to upcross, are not equal, but vary very 
considerably. Similarly, in the oscillatory series the amplitudes of the movements may 
vary very substantially, whereas in a cyclical series they should be constant (again, except 
in so far as superposed random elements disturb them). 


30.7. Now the time-series observed in practice are very rarely cyclical as we have 
defined the term. The only case among those cited at the beginning of the chapter in which 
there appears to be any cyclical movement is that of egg-production per hen in Table 29.5. 
The far more usual case is that of varying amplitude and period from peak to peak or upcross 


TESTS FOR RANDOMNESS 399 


to upcross. We shall therefore begin our study of oscillatory movements by considering 
the kinds of scheme which can give rise to the observed phenomena; and then we shall 
examine methods of deciding which of the possible schemes should be chosen as the 
hypothetical representation in particular cases. 


Tests for Randomness 


30.8. The first stage, when confronted with a fluctuating stationary series, is to 
examine whether the fluctuations are purely random. ‘Tests of randomness are easy to 
find, and in fact the random series is the happy hunting-ground of the worker whose interests 
lie mainly in the mathematics of the direct theory of probability. We have considered 
some tests which are appropriate to the study of oscillatory movement in 21.43 to 21.46. 
Others which have gained popularity are based on the distribution of ‘‘ runs” and on the 
correlation between successive members of the series. The reader will have no difficulty 
in composing others. All these tests are based on the non-parametric case, so that the 
alternative hypotheses are not usually brought specifically into view. We cannot there- 
fore apply the general theory of Chapters 26 and 27 to determine “ best” tests, and in the 
present state of knowledge are forced to be content with less definite ideas. So far as 
ease of application goes, the tests of 21.43 and 21.44 seem to have decided advantages, 
though they may be somewhat insensitive. The method of serial correlation, to which we 
refer below, gives a useful alternative in doubtful cases. In the sequel we shall suppose 
that before proceeding to search for systematic movements we have satisfied ourselves by 
one or more of these tests that such movements exist. 


30.9. We shall consider three schemes which can account for the typical oscillatory 
movement usually observed. 

(a) Moving Averages.—We have already seen in Chapter 29 that a moving average 
of a purely random element can generate an oscillatory series with all the required properties 
of varying amplitude and mean distances—the Slutzky-Yule effect (29.25). Fig. 29.6 
illustrates the kind of oscillation which may arise. It is at least possible that some of the 
observed oscillations in time-series may be generated in this way; and in fact Slutzky 
(1936) has given an interesting example in which a part of his series generated by the 
moving average happens to agree very closely with an observed series. 

(b) Sums of Cyclical Components.—We may attempt, by Fourier analysis or the more 
general harmonic analysis, to represent the oscillations as the sum of a number of cyclical 
compenents. This is the classical approach. 

(c) Autoregression Equations.—If a series is constructed by the recurrence formula 


Ups, =f (Up Ua - - » Ux) + S42 : : . (30.2) 
where f is a mathematical function and ¢ a “‘ disturbance ” function which may be a random 
variable, then under certain conditions the generated series is of the required type. We 
shall consider in particular the series 

Uspg = — Ay, — bu, + E40, : j : . (80.3) 
where a and 6 are constants and « is random. 
Table 30.3 (Fig. 30.1) shows a series of type (b) in the simplest case where only one 


cyclical component is involved, together with a random residual. Table 30.4 (Hig. 30.2) 
shows an autoregressive series constructed from random numbers by the formula 


Use =) dy 1-05 WY + E42. . . . . (30.4) 


400 TIME-SERIES 


TABLE 30.3 


Values of the Series u, = 10 sin +e, where e is a Rectangular Random Variable with 


Range — 5 to + 5, rounded off to Nearest Unit. 


| 

en ee of Series. eee of Series. aad of Series. 
1 3 Pall il 41 5 
2 8 22 13 42 12 
3 6 oR} 10 43 7 
4 2 24 6 44 5 
5 = 7 25 — 5 45 3 
6 ey 26 — 8 46 = ¥ 
5 = 27 — 12 47 — 12 
8 —- 9 28 — 10 48 — 12 
9 — 10 29 — 7 49 — 8 
10 — | 30 0 50 — il 
11 | 8 31 1 51 11 
LPs f 32 8 52 13 
1183 6 33 13 53 12 
14 4 34 7 54 7 
15 — 3 35 4 55 5 
16 — 10 36 — 9 56 — | 
17 —1l 37 — 9 57 — 6 
18 — 15 38 | — 6 58 — lt 
19 — 4 39 — 4 59 = 08 
20 4 40 — 2 60 1 

(15 


si | 

p\ | 
_5 ae id nee of fermi ae 
k VET 
wale pT 


Fie. 30.1.—Graph of the Values of Table 30.3. 


Values of Series. 


TESTS FOR RANDOMNESS 


TABLE 30.4 


401 


Values of Series 49 = lls, —0-5u, + G49 where 4. 1s a Rectangular Random 


Variable with Range — 9-5 to 9-5, rounded off to Nearest Unit. 


Number Value of 
of Term. Series. 


| 
ANAK RASOPARwWROAANY 


bo 


le 
SOMBADTNEWNEOUOWMDAMDAPWNE 
| 
e ~ 
om RTP 


bon 
be 
| 


Values of Series 
eee 
a | 
3 
NS 


Number Value of Number Value of 

of Term. Series. of Term. Series. 
23 — 4 45 —13 
24 = 46 1 
25 — 9 47 6 
26 = 4 48 4 
27 — 4 49 11 
28 3 50 153 
29 9 yt 9 
30 4 52 8 
31 — 8 53 4 
32 — 6 54 -— 1 
33 — 3 55 4 
34 — 2 56 W 
35 0 57 1d] 
36 — l 58 0 
37 — 3 59 ii 
38 3 60 0 
39 — 1 61 — 5 
40 — 8 62 —ll 
41 - 3 63 — 8 
42 — 8 64 — 3 
43 — 10 65 5 
44 — 16 


Numbe 


al 
IS 
S 


of Ts fr 


S70 


a) 


[ 


Fie. 30.2.—Graph of the Values of Table 30.4. 


A.S.—VOL. I. 


a | 


402 TIME-SERIES 


30.10. It is quite possible that theoretical reasons may suggest other schemes for 
study as the subject progresses. For instance, we might wish to consider series defined 
by differential equations, on the analogy of the similar equations determining oscillations 
in physical phenomena such as vibrating strings or electrical discharges. Something has, 
in fact, already been done in this direction. We shall, however, confine our attention 
to the three schemes indicated above, and particularly the second and third. 


30.11. On the face of it, an observed series exhibiting the typical movements in 
amplitude and period might be due to any one of the three schemes or even to a combination 
of them. We require, in the first instance, some objective criterion for deciding which of 
_ them is applicable in particular cases. Inspection of the primary data, though useful, is 
quite an unreliable guide in making a decision on this point, particularly if the series 
is short. Experience seems to indicate that few things are more likely to mislead in the 
theory of oscillatory series than attempts to determine the nature of the oscillatory move- 
ment by mere contemplation of the series itself; and yet this is the method, if one can 
dignify it by such a term, which has perhaps been most widely used in the past. 


Serial Correlation 


30.12. Suppose our series of values is wu, ... w,. Let us form the product-moment 
correlation coefficient between successive terms, i.e. 
re OO ee), 
(var u, var u;,4)* 
There will be (n — 1) pairs entering into the correlation, and the variances of u; and u;,, 
differ only in the fact that the first relates to the terms w1, U2, . . . U,_, and the second 
to the terms ws, Us, ... U,- The coefficient 7, is called the serial correlation coefficient 
of the first order, or more briefly the first serial correlation.* 
More generally, let us define a coefficient of order k: 
COV (U;, Uj+K) 


r = oe 2 e . e e . 30.6 ; 
' ar Uy Var Uj+%)* Se) 


1 n—-k 1 n—k n—k 

pms cD, (4 Uj+n) — in — be (3's) (Sv) 

= 1 nak a n—k 2)\¢ 1 = : 1 n—K 2) 
ED, aCe © “) Hed tite oie (S vst) \ 


j= 
By convention we define 


(30.7) 


Ay = Ml 
= Be : : : : : ~ (30.8) 

30.13. In practice we often require to calculate serial correlations up to rz, and for 
long series as many as 60. The arithmetic is tedious but may be systematised so as to 
reduce labour, which arises chiefly in the determination of cross-products forming the 
covariances. 

The series of n terms is written down vertically on each of two slips of paper, the spacing 
being equal on the two slips. This can very conveniently be done on a Burroughs tabulator 
with a split keyboard, the series being recorded in duplicate and the resulting strip cut up 


* It is sometimes convenient to confine this expression to values calculated from samples, the 
corresponding values for the infinite series being termed “‘ autocorrelations’’ and denoted by a Greek p. 


SERIAL CORRELATION 403 


the middle. To calculate the first product-sum we pin the slips so that the first term 
on the right-hand slip is opposite the second term on the left-hand slip, and hence so that 
the jth term on the right is opposite to the (j + 1)th on the left all the way down. For 
most series the differences of two terms which are opposite can be obtained mentally by 
subtraction, squared, and set up on an adding-machine. The sum of squares of differences 
is thus determined, and the cross-product found from the simple identity 
eee a ee) 2) — 2X — Y)?. 

We then move the right-hand slip down one space so that the jth term is opposite the 
(j + 2)th term on the left and repeat the process; and so on to as many terms as may 
be required. 

In this process Y (X?) and 2 (Y?) are required at each stage, and it is as well to deter- 
mine them by cumulative summation from the two ends of the series. 2 (X) and 2 (Y) 
are also required. It is also convenient on occasion to reduce the series to zero mean 
approximately before beginning the analysis. 


Example 30.1 
To illustrate the arithmetic we will take a very trivial example which the reader should 
check for himself. Take the series 
Se 0 ed 7 3 1, — 5. — 1, 2. 
We set up the following scheme of tabulation for calculating serial correlations up to the 
fifth order :— 


2 (X) 2 (Y) = (X*) = (Y?) 
n—k. | k. | (from beginning | (from end (from Gornend PIIDG > VA) 02) (BF) 
: of series). of series). | beginning). j 
10 0 —2 —2 170 170 0 170 
9 1 —4 3 166 145 143 84 
8 2 — 3 9 165 109 344. — 35 
i 3 2 11 140 105 445 — 100 
6 4 1 7 139 89 380 — 176 
5 5 —2. 0 130 40 172 — 1 
The number n — k is the number of pairs entering into the kth correlation. 2 (X) is the 


sum of » — k terms beginning at the first term, 2'(Y) the corresponding sum of the last 
nm — k terms, and similarly for 2 (X?) and 2 (Y?). These are the quantities required to 
calculate the variances entering into the denominator of the kth serial correlation. The 
quantities 2 (X — Y)? are calculated by the moving-slip method described above. 

We now calculate the correlation coefficients in the usual way, e.g. for 7; 


166 - 4\2 
a | — — | = 1S 
var E ( 5) 
145 3 \? 
Y = —— —({ —} = 16-000 
var 5 @) 
4 1 
cov (X, Y) = = — ( — 3) @) = 9-4815 
ee 9-4815 = 4056; 


a/(18-247 x 16) 


404 TIME-SERIES 


and for r; 
var X = = —(—3)* = 25-840 

Be ee 2 =8-000 

5 5 
1 2\ /0 
= —_—{ ——]{ — } = — 0-200 

me ete) 

nS 0-01. 
When 7 is large and the origin is chosen so that the mean of the whole series is approxi- 


mately zero, a sufficiently good value of r is given by ry a3 = ; ry wv 
required to adjust the sums of squares and products to values about the mean being small ; 
but this approximation must be used with some care and in any case the first two or three 
serial coefficients should be worked out exactly. . 


the corrections 


The Correlogram 


30.14. The diagram obtained by graphing r, as ordinate against k as abscissa and 
joining the points each to the next is called a correlogram. We shall give a number of 
examples below and shall see that the form of the correlogram provides a method of dis- 

\, criminating between the various types of oscillatory series. 


30.15. Suppose, for example, that the series is generated by a moving average of 
random elements with weights a,, a,,...a,. The typical term of the series is then 
U; =, &j + As ej44 =. om o — On Ej}+m—1 > = ° (30.9) 
Without loss of generality we may take H (ec) = 0 and hence EH (u;) = 0. Then 
E (u; Uj 4%) => E {ay &j _ As F441 + Gon + -+ an Sign} 
{01 4% + Oe Spee + ++ + + Om ejtk+m-1}- 


Since 
E (& e454) = 90, k #0 
— tnay. 1th ==) 
we have 
E (Uj Uj 45) = (Gr yyy + Oe Ogeg +. 2 +On~Am)%, « (30.10) 
provided that m>k. But if k >m then 
E (uj; uyj+4) = 0. ; : ; : . (30.11) 


Thus for an infinite series generated by the moving average the serial correlations vanish 
for k > m, and the correlogram from that point onwards coincides with the x-axis. In 
particular, if the a’s are all equal to 1/m, we have 


v 
HD ity yl — (te 
and hence 
k 
We ley . (202) 
m 


so that the correlogram consists of a straight line joining the point (0, 1) to (k, 0), together 
with the x-axis from the latter point onwards. 


THE CORRELOGRAM 405 


Example 30.2 
The weights of the Spencer 21-point formula are 
iF 
— ji-—l 
350 { ; 
Apart from the divisor 350, which may be disregarded for present purposes, the sum of 


squares of weights is 17,542. The products (30.10) and the corresponding serial correlations 
are as follows :— 


— 3, — 5, — 5, — 2, 6, 18, 33, 47, 57, 60, .. .}. 


k 2 aj Aj+k Tk k 
oum! 17,542 1-000 11 
l 16,786 0-957 12 
2 14,667 0-836 13 
3 11,584 | 0-660 14 
4 8,085 0-461 15 
5 4,726 0-269 16 
6 1,951 0-111 17 
7 6 0-000 18 
8 = ean — 0-061 19 
9 — 1,430 — 0-082 20 

10 — 1,298 — 0-074 21 

oe 


ke: 


Values of r 


Fic. 30.3.—Correlogram of Series generated by the Spencer 21-point Formula (Example 30.2). 


The correlogram is shown in Fig. 30.3. From & = 13 onwards the correlations are very 
small, and from & = 21 onwards they vanish completely. 


406 TIME-SERIES 


30.16. Suppose now that the series consists of a sine term A sin 6¢ plus ¢, a random 
residual. As before, we may suppose EH (u,) = 0, and hence 


E (u; uj14) = E {A sin 6) + e,} {A sin 6 (9 + &) + e442} 
= A? F {sin 6) sin 6 (j + k) } 


2 S” {sin 6j sin 6 (j + &) } .  . CRT 
n 
j=l 


= 2° {eos Ok — cos 6 (2) + ) } 


A? A2cos 9 (k +n” + 1)sin nO 


— = . (30.14 
O eoLUs 2n sin 0 ( 


Thus for large n we have effectively, unless 6 is small, 
2 
E (u; uj+%) = = cos 6k = B cos 0k, say. : ; . (30.15) 


Similarly we find 


E (u?) = B + vare = C, say. : ‘ : . (30.16) 
Hence 
B 
Th = G 008 6k, ea mate : : . Ola 


In short, for an infinite cyclical series the correlogram itself is a harmonic with period 
equal to that of the original harmonic component. 


30.17. When the original series is the sum of several harmonic terms the formula 
for r, will, in general, be the sum of harmonics, not necessarily with the same periods. 
Thus the correlogram will present a sinusoidal form which will not degenerate to the a-axis 
after some fixed point and will not, in fact, be damped. 


30.18. Consider now the series defined by (30.3), namely 


Up42 = — AUy4, — Oy + E49. 


This is a difference equation which is easily solved by the usual methods.* The general 
solution of 
2 Ui+9 -- AUj41 a. bu, = 0 . e . . ° (30.18) 
is 

u, = p' (A cos 6t + B sin 6t) ‘ - ; « (30.19) 


where p= +/b 
cos = — —- 
2b 
Here 1/6 is to be taken with positive sign, and it is assumed that 4b > a2. We also assume 
that 1/6 is not greater than unity. The contrary case is mathematically permissible, but 
it implies that u, increases without limit, which is outside the domain of our consideration. 


. (30.20) 


* See, for instance, Milne-Thomson, Calculus of Finite Differences, chapter 13. 


THE CORRELOGRAM 407 


Consider now the series 
De E—j419 A : : ; (a0. 21) 
j=0 
where é, is a particular solution of (30.19) such that & = 0 and é, = 1, i.e. such that 
& = aE a p' sin Ot. : 4 , . (30.22) 


On substituting (30.21) in the original equation it will be found to provide a particular 
solution. The general solution is then 


u, = p' (A cos 6t + Bsin Ot) + Die ingais Os ; ee(auia) 
j=0 


} 


As p is not greater than unity we shall, in general, find that the first term in this expression 
is damped out of existence. If we may regard our series as having been “ started up ” 
some time prior to the point ¢ = 0, the solution is effectively 


iy >) Beer . o> on a eon 
j=0 


30.19. In this form the autoregressive scheme is seen to be a moving average of 
a component « with infinite extent and damped harmonic weights. Consider now its 
correlogram. We have 


~ 4 PO Oo ee 
~ &; Si4k = reer a {p+ sin 4j sin 6 (7 + k) } 
7=0 


2p" e . 

ia 33 & Ww { cos Oh — cos 6 (27 + k) }] 

ee cos 6k cos 0k — p* cos 6 (k — 2) 

— 46 —a? { 1 — p? 1 — 2p? cos 20+ p* f° ee 
Now 

E (uj Ujan) = B{X (§ 1-541) a5; Et+n—341) } 
j 
= var > iat. Pe... 6s 
j=0 

Thus 


ies) 


var e De (E; E44) 


408 TIME-SERIES 


which, on substitution from (30.25), reduces to 


es ae 1)0 — p? si | 0.27 
t= oo pepe | ae )6 — p* sin (k yee x . (30:27) 
Writing 
2 
tan y = 77, tan, - oo a « (9 RIE 
we find 
Kas a 
ee k>O. . .  . (30.89) 


sin y 


From this we see that the correlogram will oscillate with period 27/0, but that, owing to 
the factor p”, it will be damped. If k is negative the formula applies, except that | & | 
must be used instead of & on the right-hand side of (30.29). 


30.20. We thus reach the interesting conclusion that the three types of series con- 
sidered in 30.9, however similar to the eye, will have distinct types of correlogram, pro- 
vided that the series are long enough for the observed correlations to approach the expected 
values for an infinite series. The correlogram of a series generated by moving averages, 
though it may oscillate as in Example 30.2, will vanish after a certain point; that of a 
series of harmonic terms will oscillate, but will not vanish or be damped ; that of the auto- 
regressive scheme will oscillate and will not vanish, but it will be damped. The correlogram 
therefore offers a theoretical basis for discriminating between the three types of oscillatory 
series. 


30.21. Unfortunately the series with which we have to work are very frequently 
too short to enable a decisive distinction to be made. We shall see below that divergence 
between theory and observation can be very considerable, and that sampling theory has 
not yet advanced far enough to enable us to make objective judgments in probability 
about its significance. We shall have to rely on limited experimental evidence and to 
some extent on intuitive judgment in reaching conclusions. If, therefore, the remainder 
of this chapter contains gaps in the treatment and leaves certain points undecided the 
reader will understand that the reason is ignorance rather than indifference. 


Examples of Correlograms from Observed Series 

30.22. We will in the first place give the correlograms of a few of the series given 
earlier in this and the preceding chapter. 
Example 30.3 


In Table 30.2 we gave the deviations from the trend of marriage rates for the years 


1843-1896. The first 20 serial correlations of this series are shown in Table 30.5 and the 
correlogram in Fig. 30.4. 


THE CORRELOGRAM 409 


TABLE 30.5 
Serial Correlations of the Marriage Data of Table 30.2. 


Order of Order of 
Correlation Ty Correlation 
ies k. 

il 0:563 itil 
2 — 0-089 12 
3 — 0:498 13 
4 — 0:631 14 
5 — 0-467 15 
6 — 0:025 16 
UI 0-353 1l7/ 

8 0:396 18 
9 0-254 19 
10 0-104 20 


Values of 1,. 


Fia. 30.4.—Correlogram of Marriage Data of Table 30.2 (Table 30.5.). 


The correlogram is smooth and suggests the operation of an autoregressive scheme. 
There is little indication that a moving average, at least of extent less than 20, would account 
for the series, but on the other hand some damping appears to be present. 


Example 30.4 
Table 30.6 shows the first 60 serial correlations of the Beveridge series of Table 30.1, 
the correlogram being given in Fig. 30.5. 


410 TIME-SERIES 
TABLE 30.6 
Serial Correlations of the Beveridge Wheat-Price Index of Table 30.1. 
Order of 
Correlation Te k. T;. Tze k. Te 
1. 0-562 16 0-158 0-060 46 — 0-036 
2 0-103 Ie 0-109 — 0-008 47 — 0:013 
3 — 0:075 18 0:002 — 0-039 48 0:042 
4 — 0-092 19 — 0:075 0:007 49 0:062 
5 — 0-082 20 — 0:062 0-056 50 0:065 
6 — 0:136 21 — 0:021 0:010 51 0:050 
ai — 0-211 22 — 0:062 — 0:004 52 0-009 
8 — 0-261 23 — 0:088 — 0:015 53 — 0-027 
9 — 0-192 24 — 0:084 — 0-047 54 — 0:053 
10 — 0-070 25 — 0-076 — 0:047 55 — 0-073 
Mal — 0:003 26 — 0:091 0-008 56 — 0:106 
1 : — 0:015 27 — 0-052 0:034 57 — 0-084 
Ss — 0:012 28 — 0:032 0:065 58 — 0:019 
14 0:047 29 — 0:012 0:099 59 0-003 
15 0-101 30 0-059 0-009 60 0-010 


ae of OG 


“PLA 
| tee aa 


| 


Fie. 30.5.—Correlogram of the Beveridge Series of Table 30.1 (Table 30.6). 


The correlogram here is almost certainly damped. The oscillations persist in a most 
remarkable way, notwithstanding the diminishing amplitude, and the presumption is 
a strong one that the series is of the damped type. 


THE CORRELOGRAM 411 


Example 30.5 

In Table 29.8 (page 386) we gave the residuals of a sheep-population series for the 
years 1871 to 1935. Table 30.7 shows the first 30 serial correlations of this series and 
Fig. 30.6 the correlogram. Again the correlogram is oscillatory, but the damping is not 
so clear. 


TABLE 30.7 
Serial Correlations of the Sheep Data of Table 29.8. 
Order of 
Correlation ee [fp Tye Hes Tp 
k. 
1 0-595 Jul — 0-142 PALL — 0°381 
2 — 0-151 12 — 0-172 22 — 0-118 
3 — 0-601 13 — 0:186 23 0:173 
4 — 0:°537 14 — 0-128 24. 0:343 
5 — 0:138 15 0-052 25 0-352 
6 0-144 lo 0-276 26 0:154 
ii 0:203 ily 0-439 O27) — 0-203 
8 0-118 18 0:293 28 — 0:456 
9 0-006 19 — 0-074 29 — 0-415 
10 — 0:078 20 — 0°359 30 — 0-184 


\— oo 
iaraton ated 


Values of r,. 


i 
aa. 7) 
ne iv 


Fig. 30.6.—Correlogram of the Sheep Population Data of Table 29.8 (Table 30.7.) 


412 TIME-SERIES 


Significance of a Correlogram 


30.23. The foregoing examples illustrate one of the main difficulties we have to face 
in correlogram analysis. On intuitive grounds we seem to be justified in rejecting the 
scheme of moving averages as a possible scheme for the series of these examples, since the 
oscillations in the correlograms persist ; but we can no doubt find moving averages which 
will produce such correlograms, though their extents would have to be long (over 60 in 
the case of the Beveridge series) and their weights artificial. The only final test seems to 
be to ascertain such a moving average and then to examine whether it will predict further 
terms in the series if such can be observed. 


30.24. Distinction between the scheme of harmonic components and the auto- 
regressive scheme is even more difficult for short series, since the correlograms for the 
latter do not damp out according to expectation. Consider in fact an autoregressive 
scheme of the simple linear type (30.3). There will be the usual variation in length from 
peak to peak and in amplitude ; but if the section of the series is a comparatively short 
one, covering, say, four or five oscillations, the oscillations will not have time to get very 
much out of step and the serial correlations will be systematically larger than one would 
expect for an infinite series. This effect is exhibited in Table 30.8 and Fig. 30.7, which 
give the serial correlations and the correlogram for the series of Table 30.4, given by the 
formula 

Us = 11 Uy, — 05 Uy + Epo. 


Here the damping factor p = 1/6 = 0-7071, and by the thirtieth correlation 7, should be 
very small, less than 0-002 in absolute magnitude. Actually it is 100 times as large. The 
mere fact that an observed correlogram for a short series fails to damp very rapidly is 
not, therefore, a very definite indication that the series is not ruled by the autoregressive 
scheme. On the contrary, failure to damp may be expected. 


30.25. We are on firmer ground when considering the significance of a correlogram 
in the sense of judging whether it can be derived from a random series. 


; ; : ‘ : 1 : 
(a) The variance of 7;, in a random series of n terms is approximately ae provided 
n — 


that » is large. For 


1 n—k 2 iT 
E 1 4 Dy 5 C48) } = ep EE en 1 PE ey Oy ne j#m 
j=1 
i 
aus EX (Xj 4%) 
= var? x. 
Hence, for large samples, 
1 var? xz 1 
var r = ——_ —___ = —__., : : - . (30.30) 


R. L. Anderson (1942) has recently given exact results for the significance of a serial 
correlation. 

(6) For our purposes, however, the important point is not whether a particular serial 
coefficient is significant, but whether the oscillatory character of the correlogram as a whole 


SIGNIFICANCE OF A CORRELOGRAM 


413 


TABLE 30.8 
Serial Correlations of the Artificial Series of Table 30.4. 
Order of | 
Correlation Ty k. | Tye k. Ty: 
te 
| 
I 0:70 ll — 0:05 21 0:05 
2 0-29 12 — 0:17 22 — 0:12 
3 0-01 13 — 0-27 23 — 0-28 
4 — 0:17 14 — 0-31 24 — 0:43 
5 — 0:27 15 — 0-30 25 — 0:57 
6 — 0-25 16 — 0-18 26 — 0°56 
vai — 0:13 17 0-12 27 — 0:26 
8 0-07 18 0-29 28 0-02 
9 0-12 19 0:33 29 0-17 
10 0-05 20 | 0:22 30 0-27 
1°0 


as 5 WN 


[= 
b 


ye 002 — 2s | 
a | \ | 
wn E 
Values of k. 
= i 
. ule. 
—0°4 
—0*6 ed | ——————— 


1) 
Fie. 30.7.—Correlogram of the Artificial Series of Table 30.4 (Table 30.8.). 


is so. Here we have to form an intuitive judgment, but it can hardly be doubted that 
the undulations in Figs. 30.4 to 30.6 are not accidental. Something exists to be explained 
as a systematic effect, though what that effect is may be more difficult to decide. 


30.26. We shall proceed to study the autoregressive scheme and the scheme of 
in more detail, without prejudice for the time being to the question 


cyclical components 
This latter is not, in fact, 


as to which is the better representation in particular cases. 
entirely a statistical matter, and we shall return to it in 30.39. 


414 TIME-SERIES 


The Autoregressive Scheme 

30.27. We consider in the first instance the simplified scheme of equation (30.3). 
The theoretical correlogram for a series generated by this equation is of the damped type 
given by (30.29), 


p* sin (kO + y) 
Ut a 5 
sin wy 


where 22/6 is the autoregressive period of the regression equation and is given by 


a 
2b 

The typical series of this kind has no “ period ” in the strict sense. The lengths from 
peak to peak or from upcross to upcross vary in the characteristic way. It appears from 
experiment (but has not, I think, been shown theoretically) that the distribution of dis- 
tances from peak to peak is of the unimodal type with a central value somewhere near 
the mean distance between peaks ; and similarly for troughs and upcrosses. In speaking 
of the ‘“‘ period” of an autoregressive series we mean the central value of one of these 
distributions. The question we have now to consider is whether this period is the same 
as the autoregressive period 27/0 of the regression equation. 


cos 96 = — 


30.28. We have seen in 29.26 that the mean distance between upcrosses of the 
series generated by the moving average whose weights are £,... &,, is given by 27/¢, 
say, where 


Substituting for & from (30.22) and using (30.25), we find 


2p { cos# ——scos 8 (1 — p?) \ 
cos ¢ = 40 — v 1 a 1 — 2p? cos 20 + p4 
2 Laie 1 — p* cos 20 
4b—a?|1—p? 1 — 2p? cos 26 4 p4 
ee 
=e 
_ 4 
ees : : “ . . : . e (80.31) 


Thus the mean period as defined by upcrosses is 


27 /are cos (5) : ‘ ae - (30,32) 


whereas that for the autoregressive period of the equation is 


22/arc COs (=) : ° . : » (30.33) 


THE AUTOREGRESSIVE SCHEME 415 


30.29. The mean period between upcrosses is thus not the same as the autoregressive 
period. The two are very close for many of the values of a and 6b arising in practice. For 
instance, when b = | they are identical; when a = 1, 6 = 0-5 their ratio is 1:07. One 
might infer that an estimate of the period of an autoregressive scheme can be obtained 
from the correlogram, but this generalisation requires some important qualifications. 

(a) Firstly, the ratio of (30.33) to (30.32) is not necessarily close to unity for values 
of 6 in the neighbourhood of a2/4, i.e. when 6 is small and the autoregressive period is long. 
Consider, for instance, the series generated by 


Upp = 1:24, — Oday + Ep 9. 


We have 
a 1-2 
= — = = 0-9499 
pos 2b 2/04 
Ose period = 19-7 units. 

However, for ¢, 

1-2 

COs @ = a 0-8571 
we ol”, period = 11-6 units. 


The mean distance between upcrosses, and a fortiori that between peaks, is very much 
shorter than the autoregressive period. 

(6) The mean distance between upcrosses may miss certain oscillations above or 
below the x-axis, so that it overestimates the period between peaks or troughs. On the 
other hand, the latter may include ripples on the main wave which we wish to ignore. 
The reader can verify for himself, by constructing an autoregressive series by some such 
formula as the above, how difficult it is to draw the line in particular cases. The difficulty, 
however, must be faced, for it is precisely the kind which we meet in dealing with observed 
series. 

(c) Owing to the appearance of the phase angle y in equation (30.29) the starting- 
point of the correlogram (k = 0) is not to be regarded as a maximum. The period of the 
correlogram is therefore to be calculated either by ignoring this point or by reference to 
distances between troughs and upcrosses in the correlogram. 


30.30. The equation 
Upp + AU, + buy, = e449 
may be regarded as expressing the regression of u,,. on u,,, and w, the term e,,, being 
a residual error. We may therefore estimate the constants a and 6 from the regression 
equation of the observed series in the usual way. If we assume that the series is long enough 
for end effects to be negligible in determining the variances of the finite series, then 
var U,,9 = Var Uj, = var u,, and from the usual formulae for regressions we find 


ee a OS : é (30.34) 
ie P : : ; 
T, — 1? 1—r, 
eS | oS See : : » (30.35 
l-r Bey 98 ( ) 


This gives us the constants of the autoregressive scheme from the serial correlations. 
It should, however, be realised that these estimates are rather sensitive to superposed 
error of the type we refer to below (30.32), and it is therefore unsafe to estimate the 


416 TIME-SERIES 


autoregressive period from them. ‘The correlogram itself appears to be a safer guide on 
this matter. 


Example 30.6 


Consider again the sheep data of Table 30.7 and Fig. 30.6. Suppose we have decided, 
from the appearance of the correlogram, to attempt to represent the series by an auto- 
regressive scheme. 

In the first place, we have to inquire whether a scheme of the simple linear form (30.3) 
is likely to be adequate. Would it, for example, be better to consider the more general form 


Uys + App 2 + Opp 1 + Cy = E43; 
or need we take into account curvilinear regressions such as 
Uppo + Oey +a! Uppy + buy + bu} + ey, ? 
The first point can be elucidated by the use of partial and multiple correlations. The 
following are the partial coefficients and the function of the multiple correlation 1 — R? 


as determined by the continued product of (1 —r?) (cf. vol. I, equation 15.45, 
p- 380) :-— 


ae 

Order of Partial Value of Partial a(t —P) 

Correlation. Correlation. ; 
12 | 0-595 0-6460 
13.2 — 0-782 0-2509 
14.23 0-097 0-2485 
15.234 — 0-183 0-2402 
16.2345 0-031 0-2400 


17.23456 0-014 0:2400 


Evidently no appreciable gain in representation is to be obtained by taking the regression 
on more than the two preceding terms. 

The possibility as to better representation by taking curvilinear regressions may be 
considered by drawing the scatter diagrams of u, on u,,, and u, on uw,5. These are 
shown in Fig. 30.8. It seems clear that there is an essential scatter in the data which no 
ordinary polynomial can represent, and that curvilinear terms are unlikely to add anything 
material to the linear regressions. 

We conclude that if the data are of the autoregressive type it is unnecessary to con- 
sider any more elaborate scheme than the simple type 


Uppe + Uy, + buy, = E49. 
For this series we have 


r, = 0595, rr, = — 0-151. 
Hence 
— "i(l =") _ 1.960 
l—r 
=) = Seo 


l—r; 


Fic. 30.8.—Scatter Diagrams of w on w%+41 (top figure), and w on w+2 (bottom figure). 
A.S.—VOL, I. 417 EE 


418 TIME-SERIES 


The autoregression equation is 
Uji = 1:060 444 — 0-782 wy + e149. 

For the autoregressive period we have 
1-060 


ee = 0-600) Gia 
00s 8 = 570.783) 
Da ae OO 
and hence the period is 5 6-8 years. 


Now in the correlogram (Fig. 30.6) there are peaks at k = 7, 17 and 25, giving a period 
of about 9 years; and there are troughs at k = 3, 13, 21 and 28, giving a mean period 
of 8:3 years. The autoregressive period as estimated from the correlogram is then between 
8 and 9 years, whereas that given by the autoregression equation is 6-8 years, considerably 
shorter. 

Using the values of a and 6b found above, we have for the mean distance between 
upcrosses, 

_ 1-060 
1-782 


os 


giving a mean distance practically equal to the autoregressive period as shown by the 
regression equation. 
Finally, looking to the original series, we see that there are nine major peaks, the 


= 0:5948, = 53-5°, 


: . 58 
first in 1874 and the last in 1932, so that the mean distance between peaks is = = 720 
years ; and nine upcrosses, the first between 1872 and 1873 and the last between 1930 and 
: . o8 
1931, so that the mean distance between upcrosses is ae 7-25 years, the same as for peaks. 


The upcross at 1876-7, however, is due to a temporary fall below the zero line, and had it 
not occurred we should have found a mean distance of 8-3 years. 

We have therefore reached this position: the mean period in the series itself appears 
to be about 7-25 years ; that given by the regression constants is 6-8 years ; and that given 
by the correlogram is about 8-5 years. These figures are scarcely close enough for comfort, 
and further data would be required to arrive at a more accurate estimate of the mean 
period. Nevertheless, they illustrate very well the kind of divergence which appears to 
be more the rule than the exception in dealing with short series. We should expect the 
correlogram to give a higher value than the series itself, for there may appear peaks or 
upcrosses in the latter which are purely temporary fluctuations due to the casual element. 
On the other hand, the regression constants appear to give consistently lower values for 
the autoregressive period than the correlogram, an effect found by Yule (1927a) for sunspots, 
Wold (1938a) for cost-of-living indices, and Kendall (1944a) in series of agricultural prices, 
acreage and livestock populations. 


30.31. Let us examine more closely the effect referred to at the end of the previous 
example. Our autoregressive system is based on a random element ¢, which is added to 
the term u,,. We can therefore regard the value at time ¢ + 2 as composed of two parts, 
a systematic element expressed by au,,, + bu, giving the effect of the past history of the 
system at times ¢ + 1 and t, together with a new random element peculiar to the moment. 
This latter is random in the sense that it is casual and unpredictable; but once it has 
occurred it is incorporated into the motion of the system and exerts an influence on future 


THE AUTOREGRESSIVE SCHEME 419 


history. It is therefore quite unlike an error of observation or a sampling error which 
distorts the value of a particular member but does not affect the others. 

Now suppose that such an error of observation is present, and let us represent it by 
y. For long series this element will increase the variance of the observed values by var y, 
but if it is independent of the remaining constituents of the series it will not affect the 
covariances. Hence the serial correlations will all be reduced in a constant proportion ¢, 
except of course 7,; and this, as we proceed to show, will affect the autoregressive period 
as derived from the regression constants, in general shortening the period quite considerably. 


30.32. If r, is reduced to cr, and r, to cr,, the constants of the regression equations 
are, from (30.34) and (30.35), 


ga Mle) me 7 . (30.36) 


1—cr 
— £2 2 

—o =o. Pr 
1 


The estimated autoregressive period is then 6’, given by 
a 
— 3 
cr, (1 —cr,) 
~ 2a/(1 — c? 13) (02 7? — er)" 
Differentiating the logarithm of this expression and putting c = 1, we find 
, db’ 2re 2r3 re 
— 2 tan 6 rr er eee re 


cos 9’ = 


which reduces to 
do’ (1 +. b) (30? + b — a?) 
de 2 {(1 +6)? —a?}° 


Now tan@é = Jz — 1) and the period P = 27/0. We then find 


dP _ _ Pa (1 + 6) (3b? + 6b — a?) 
de }p=1 4b {(1 + b)? — a?}4/(4b — a?) 
This equation gives us an approximate idea of the change in the period P for small 
changes inc nearc = 1. For instance, with a = — 1-5, b = 0-9 we find P = 9-7 units, 


and from (30.39), 
dP 
— = — 16°5. 
( de ). 


Thus, if c = 0-9, i.e. the variance of 7 is about 10 per cent. of the total, the period will be 
reduced by about 1-65 years, a substantial amount. 


— tan 6’ 


2. 6 (30.88) 


. (30.39) 


30.33. It is thus possible that the observed discrepancies between the autoregressive 
periods as given by the regression constants and the correlogram may be due to superposed 
random fluctuation which is not incorporated into the autoregressive scheme. This is 
not the only possible explanation ; for instance, in particular cases the disturbance function 
é may not be random. The hypotheses to be considered in such a case, however, are so 
complex that it is difficult to pursue a quantitative investigation without a wealth of 
material; and this, unfortunately, is usually denied to us, at least in economic work. 


420 TIME-SERIES 


Meteorological data are more numerous, and we may hope that further light will be thrown 
on the autoregressive scheme by a re-examination of the material available in this field. 


30.34. Consider now the more extended autoregression equation 
Uns + Or Uppm—i + Os Umer + + © i Om Uy — epee . (20.40) 
The explicit solution cannot be given in the simple form available when m = 2. It has, 
in general, the solution 
=A, a +A,o6+... +4, 9%, +B, . 3 . (30.41) 
where «a, ... ,, are the roots of 
a” + a, ee Ge Oe een » . « (30.42) 
and B is a particular integral involving the e’s. For the series to be oscillatory without 
increasing indefinitely no term such as 2‘, where z is real and greater than unity, can appear. 
Assuming this to be so, and assuming further that the series was “ started up ”’ some time 
before ¢ = 0, we reduce the solution to the particular integral B. 
mm 
Choose a particular value é, of D>, 4s a, such that 
=1 
& =0 
é,+a,6 =1 
é:+%6,+4,6=—0) . : . (30.43) 


Se + @, reer ata Cd = On <4 ea. 


This is always possible in general, for it imposes m conditions on the m constants A. Then 
it will be found on substitution that a particular integral B is given by 


el ae dD, & Et—j+1 : A ; : .. (30.44) 
j=0 


a generalisation of (30.24). Our series may then be regarded as generated by a moving 
average of infinite extent, the weights being combinations of damped harmonic and 
exponential terms. 


30.35. The correlogram of such a series may be determined by the following method, 
due to Walker (1931). Multiply (30.40) by u,_, and sum. We find 
a 
Veem 1 81 Te+m—t + Gateim—est <= Oe — = eee . (30.45) 
Now w,_;, depends only on ¢,_; and terms with lower subscripts and hence is uncorrelated 
with €4., for k>—m. Thus we have 


"ion rtp t P ) eel ape. k>—m . - (30.46) 
If we multiply (30.40) by wuizsm we find similarly 
Tr ale Oni) at on ag Ain Tem = 2 (Et4m Ut+h-+m) : , (30.47) 
var wu 


but the expression on the right no longer vanishes. In fact Ui+k+m contains the term 
€441 44m, and hence 


var € 


Te + 1 Tega + «2 + Oy, Tem = €e41 ’ 
Var u 


k>—m. —. (30.48) 


THE AUTOCORRELATION FUNCTION 421 


From (30.46) it follows that the serial correlation 7, will be given by 
iin == Dm (A; a), ° . . . ° (30.49) 


where the «’s are the roots of (30.42) and the A’s are constants to be determined from initial 
conditions. Thus the correlogram will be the sum of terms which either decay exponentially 
to zero («x real) or oscillate with a similar decay to zero (« complex). Walker (1931) has 
used this result in an inquiry into a series of atmospheric pressures. 


The Autocorrelation Function 


30.36. If we have a series u (t) defined at every point of time in some range — h 
to +h, we may define its variance as 


{wwe a. ee ee ore 


on the assumption that the mean value is zero, which does not limit our generality. Sup- 
pose the series is reduced to standard measure by dividing throughout by the square root 
of this variance. Then an evident generalisation of the serial correlation is given by 


h 
S| eae k) dt...  .  euem 

2a 
We shall call this the autocorrelation function. We can likewise regard it as defined when 


h tends to infinity, provided that the limit on the right in (30.51) exists. It is to be noted 
that r(k) is in that case an even function of k. 


r (k) 


30.37. We shall also consider the function 


R(k) =| u(t)wtt+khd . .  .  « (80.52) 


—o 


when it exists. We have 


{. R (k) ot dk = | | eth vy (t) w(t + k) dt dk 


=| I ip (+h) y (¢ + he) ett y (t) dt dk. 
The simple substitution ¢ + k = q reduces this to 
i) eft vy (q) dg | * emtot w(t) dt. 
Thus, if we write < _ 
a(p) +48 (p) =|" emu lgag, . . . (80.53) 


we have 
R (k) e*? dk = a? (p) + B? (p). : : . (30.54) 


It follows, as is otherwise evident from the fact that R (k) is an even function, that the 
imaginary part on the left of (30.54) vanishes, and we have 


ive) 


R (k) cos kp dk = a? (p) + 6? (p). ‘ : « (30.55) 


_-—7” 


422 TIME-SERIES 


If, following the notation of characteristic functions, we write dp (p) for the integral on 
the left in (30.54) and ¢, (p) for that on the right in (30.53), we have 


br (p) =| ¢u(p)|%  - : : : . (30.56) 


We may then put dy, (p) = Vép em, : : A é . (30.57) 
where mu is an arbitrary real function. We shall then have 


u—=s-[ dulpyetdp 


NaS aw = 3 : 
= mal V bp exp (iu — ttp) dp. : : . (30.58) 
UT J —co 
Since u (#) must be real, the imaginary part vanishes and this is equivalent to 
‘leer = 
“(== | Vgn 00s (u —tp)dp,. . . — . (30.59) 
UT J —a« 


and « must be an odd function of p. The result is due to Wiener (1930). It shows that 
the autocorrelation function R does not uniquely determine wu (t) because of the arbitrary 
function yp. 


30.38. Consider now the autocorrelation function r (k) as defined in (30.51). Let 
us regard the series as defined but equal to zero outside the range —h to +h. 
Then we have 


2h r (hk) -| 


where # and r are zero outside the range — 2h ‘to + 2h. The foregoing results then con- 
tinue to hold with some modifications concerning factors in 2. If we write— 


x0 u(t +h) dt = u(t)u(t+h)dt=R(k), — ~ (30.60) 


ay! »\ pikp qj. — Ly ba n\ pik b : 
8.(0) =;| 1 (het dk = aie R(kyedk  . —. (30.61) 
a 1 rh : bey ee : 
and du (p) = z| uw (t) e!? dé = = u (t) e” dt, “ s . (30.62) 
—h —o 


then corresponding to (30.56) we have 


2¢6,(p) =|¢ulp)[% . . «© «  .« (80.63) 


We may now let A tend to infinity and observe that the results continue to hold under 
certain general conditions, provided that the limits exist. 


Laample 30.7 
Consider the series 


w (t) = A, sin (A, ¢ + a) + Azsin (Apt + a) +... +.A,, SiN (Am E+ Om) 


For the variance we have 


Ween 2 (nes “2 
ta | u? (t) dt = lim 5 | b> {A} sin? (jt + a4) } dé, 


—h 


since the cross-product terms will contribute only a finite amount to the integral and hence 
vanish in the limit, 


PERIODOGRAM ANALYSIS 423 
h 
iim xx | 4 ZA} {1 — cos 2 (at + a) } Jt 
an jn 


= $5 (4%). 
Similarly for wu (t) u (f+ k) we have 


lim 5; [) [E (A, sin (4, ¢ +o) } LE (4, sin (4, + 4,& + 4) } Jat 
2h J in 


h 
ie a) 32 {A} [oos A, & — cos {Ay (24 + &) + 2a} ] } dt 
3 X Aj cos A, k. 
& {Aj cos (A, k) } 
Se 
The correlogram is the sum of a series of harmonics, like the original series, but the 
coefficients are different and the harmonics are all in phase. 


Thus ¢(k) = 


30.39. The idea underlying the autoregressive scheme of representing time-series 
may perhaps be best illustrated by an analogy. Imagine a motor-car proceeding along 
a horizontal road with an irregular surface. The car is fitted with springs which permit 
it to oscillate to some extent but are designed to damp out the oscillations as soon as the 
comfort of the passengers will permit. If the car strikes a bump or a pothole in the road 
the body will oscillate up and down for a time but will soon come to rest so far as vertical 
motion is concerned. If, however, it proceeds over a continual succession of bumps there 
will be continual oscillation of varying amplitude and distance between peaks. The oscilla- 
tions are continually renewed by disturbances, though the distribution of the latter along 
the road may be quite random. The regularity of the motion is determined by the internal 
structure of the car; but the existence of the motion is determined by external impulses. 


30.40. It appears to me very plausible to suppose that oscillations in time-series 
are generated in this way. One does not have to postulate some external rhythmic influence 
which keeps the oscillation going, or to suppose that the system will oscillate without 
damping once it has been set in motion. Nor is it necessary to assume that the majority 
of the deviations between theory and observation are due to “errors” which exert no 
effect on the subsequent movement of the system. The reader, however, will have to 
form his own opinion on this matter.* We now proceed to examine an alternative scheme 


of representation in which the series is represented as a sum of (undamped) cyclic terms. 


Periodogram Analysis 
30.41. It is well known that under certain general conditions a function f (t) can be 
expanded in the Fourier series, valid in a certain range, 


f(t) =a + a, cos = = + a cos = + a,.008 T= + 
“1 1 
+ by +b, sin = + bysin + bpsin 4... |, (80.64) 
Ay Ay Ay 


* The scheme considered in this chapter may over-simplify natural conditions in that it assumes 
finite random disturbances at equidistant time-intervals. If the intervals are not equal, or if the dis- 
turbances are small and continually occurring, the autoregressive scheme is only an approximation. 


Much remains to be done on this subject. 


424 TIME-SERIES 


Functions which are not periodic can be expanded in this way; for instance, in the 
range 0 <2 <2, ; 


e = sine — 5 sin 2 + 5sin 3x — sin de + Eats 
The function of course, repeats itself in the range 7 <x < 2a, and so on. 

As a representation of observed series the Fourier series is rather restricted in scope, 
since the period of every term is a multiple of the fundamental period 2/,. A more general 
scheme is provided by the series 


fio dea, cos | aueee ease Meee 
Ay 7s 


+ Repost they ee 
Ay A; 
or the alternative form 


f(t) = Ao + A, 008 (5 + a ) + A, 008 (5% + as) aan . (30.66) 


Here the /’s are not necessarily commensurable. The object of our analysis is first of all 
to find out what are the best values of the A’s to select, and secondly to evaluate the other 
constants a and 0b, or A and «. 


30.42. Suppose we wish to test whether a time-series contains a harmonic term with 
period w. Consider the series 


2 2709 
= — >) cos ~ : : : : (30:67)% 
j=1 
ae 7) 
a a ¢ : : : . (30.68) 
p> y lu 
and write 
S2 — A2 + B2 
4 wy y J9 
= 2 {wexp (2h ; . (30.69) 
Ue 
Suppose that the series is in fact given by 
Oni 
u, = asin = Oe ‘ ‘ : p . (30.70) 


where b; is a component which we will assume to contain no cyclical element, so that its 
correlation with the other component is zero, at least for long series. Then we have 


2a < e177) 27) yee 275 
le SS (sin =P? cos 222) + = (6 cos = | 
=e A uu = : lb 


* Some writers define these sums with j from 0 ton —1. The signs of A and B may then differ 
from those given by (30.67) and (30.68), but the intensity and phase are unaffected. 


PERIODOGRAM ANALYSIS 425 


and the second term may be neglected. Thus, writing 


2 ge 


alias u 


we have 


2a . 
A = 7 = (sin aj cos §7) 


=< 5 {sin (« — £)j + sin (« + f)j} 


_@ { sin 4 («—£) n sin 4 (a—f) ae iD 3 (a+) m sin $ («+f) (n+1) (30.71) 
n sin $ («—B) sin $ (4+) ~~ e 


For large n this remains small unless « approaches f (or — £, which is essentially the same 
situation), and in that case we have 


A~a sin} (« — B) (n+ 1). 
Similarly, B~a cost (« — Bf) (n+ 1), 


so that Ss? = Ae b= a7 : A 6 “ e(a0sia) 


Thus S remains small unless the “ trial’ period ~ approaches the real period A, and in that 
case equals the amplitude a. 


30.43. Similarly we may expect that if the series consists of a sum of harmonics 
with periods A,, A,, . . . A, S will be small, unless w is equal to one of these periods, in 
which case it is finite and equal to the amplitude of the term concerned. 

This result forms the basis of what is known as periodogram analysis. We select 
a number of trial periods for different values of « and calculate S? for each of them. 82, 
which is called the intensity, is then exhibited as a function of uw, and graphed as ordinate 
against 4 as abscissa. The diagram obtained by joining the points, each to the next, is 
called the periodogram. If this figure has peaks at certain values 2, . . . 4,, and we are 
prepared to assume that these are not sampling accidents, the values are the appropriate 
periods of harmonic terms and the intensity S? provides the corresponding amplitudes. 
The quantities A and B of (30.67) and (30.68) are obtained incidentally and provide the 
phase angles « of (30.66). We shall illustrate the arithmetic processes below. 


30.44. Fig. 30.9 shows the periodogram of the wheat-price index data of Table 30.1. 
In order not to confuse the diagram for lower values of the trial period we have shown 
only the major fluctuations. The length of the series was about 300 years from 1545 to 
1844, earlier and later figures shown in Table 30.1 not having been taken into account. 
The primary data have been taken from Sir William Beveridge’s classical paper (1922) and 
are shown in Table 30.9. For practical reasons which will emerge presently, certain trial 
periods are taken not over exactly 300 years but over the number N of years shown in 
the table. To reduce the figures to comparability, Beveridge therefore multiplied the 

N 


2 2 cain 
sum A? + B? by 500" 


TIME-SERIES 


426 


06. 


08 


OZ 


(6°Of STGBL) Xopur so1g-yeoymM oDpldoAeg oyy Jo ureisopoleg—"§'0€ “OA 
‘(sanah) por.eg 


09 0S OF of 0c 


ol 


So 
+ 
“fgisuaquy 


PERIODOGRAM ANALYSIS 427 


TABLE 30.9 


Periodogram Analysis of the Beveridge Wheat-Price Index Data of Table 30.1. 
(From J.R.S.S., 1922, 85, 412.) 
The first observation relates to 1545, except where A and B are given in heavy type. 


Period Number Intensity Paood Number Intensity 
of Years} A. B. N (dA? + B?) of Years| A. B. N (A? + B?) 
(Years). Se a EA (OCES) - = ass 
N. ~ 300 N. 300 
2-000 300 +011; — 0-01 2-667 312 — 0-92/ + 1-20 2-38 
2-049 336 — 0-406} — 0-09 0-19 2-687 301 + 1:23; — 0-02 1-52 
2-054 304 + 0-48 | — 0-72 0-77 2-692 | 315 — 0:04} + 0-23 0-06 
2-061 340 + 0:3&| — 0-57 0-54 2-706 322 — 0-27) + 1-33 1:97 
2-069 300 + 0-25] + 0-63 0-46 2-714 304 + 0-83| + 1-17 2-10 
2-074 336 — 0-61/ + 0-51 0-71 2-727 300 + 0:86| + 1-46 2-87 
2-080 312 + 0:92! — 0-50 1-14 2-733 287 + 2:05) + 1-19 6:16 
2-087 288 — 0-52} — 0-11 0-27 2-735 279 + 2-44) + 1-23 7-82 . 
2-095 308 — 0-91} + 0-90 1-69 2°737 312 + 2-23] + 1:00 6:22 
2-105 320 + 0-90) + 0-07 0-86 2-741 296 + 2-43) + 0:25 5:86 
2-112 288 + 0-90} + 0-80 1-38 2-750 | 308 + 0-90| — 0-84 1-55 
2-138 320 + 0-89} + 0-15 0-84 2-762 | 348 — 0:57) — 0-04 0:37 
2-154 308 + 0-48] + 0-23 0:29 2-769 324 + 1-49) + 0-23 2:28 
2-182 288 + 1-32} — 0-59 ay 2-778 325 + 1-20} — 0-92 2:48 
2-200 308 — 0-13} — 0-60 0-39 2-800 336 — 1-01) — 0-19 1-18 
2-222 320 — 0-32] — 0-62 0-52 2-818 310 + 0:55) + 1:07 1-49 
2-261 312 + 0-50} — 0-22 0-31 2-833 323 + 0:78; — 0-10 0-67 
2-286 320 — 0-38] — 0-85 0-93 2-846 296 | + 0-41! + 0-42 0:34 
2-316 308 + 1:39| — 1-05 3-11 2-857 320 + 0:96) + 0:21 1-03 
2-333 308 — 0-10}; — 0-25 0-08 2-875 322 + 0-35) + 0-14 0-15 
2.353 320 + 0-90] + 0-07 0-86 2-888 312 + 1-51| + 0:26 2-43 
2.364 312 — 0-12] — 0-63 0-43 2-895 330 — 0:69) — 1-57 3-21 
2-370 320 + 0-05 | — 0-28 0-08 2-909 320 | + 0-70} — 1-11 1-84 
2-375 304 + 0-29; — 0-43 0-27 2-933 308 — 0-04} + 0:39 0-16 
2-381 300 — 0-19} — 1:22 1-53 2-947 336 — 0-93) — 1-19 2-57 
2-385 310 — 1-:00| — 0-89 1-86 2-960 296 — 0:00} — 1-15 1:30 
2-391 330 — 1:30} — 0-54 2-18 3-000 | 300 — 0:29} — 0-39 0:23 
2-395 309 — 0-72} + 0-60 0-90 3-040 304 + 0:09; + 0°75 0:58 
2-400 312 + 0:34! + 0-68 0-60 3-077 320 + 0-05} + 1:18 1-50 
2-412 328 — 0-08 | — 0-65 0:47 Spill 336 + 0-91) — 0-44 1-15 
2-417 348 + 0-63} + 0-57 0-69 3-143 308 | + 2:01| + 0:23 4:20 
2-435 336 + 0-44} + 0-01 0-22 3-167 304 | + 0-46; — 1-05 1-33 
2-452 304 — 1-40| — 0-51 2-23 3-200 320 + 0-43 | + 0-95 1-16 
2-462 320 — 0-25); + 1-49 2-44 3-217 296 + 1-25] + 0-00 1-55 
2-476 312 — 0-38} + 0-35 0:27 3°250 312 — 1:22} — 0-47 1-80 
2-483 288 -| — 0-07| + 0-74! 0-53 3-273 324 — 0:55| + 1-18 1-82 
2-500 320 — 0-24] + 1:19] 1-56 3-286 322 — 0-11} + 0-99 1-07 
2-512 324 + 0:86) + 0:39 0-97 3-304 304 | + 0-13} + 0-75 0-59 
2-516 312 + 0-45| + 0-24 0-26 3-333 320 | + 0-90/ + 1-58 3°54 
2-529 301 — 0:19} — 0-31 0-13 3-364 296 + 1-76) + 0-98 4:00 
2-545 336 — 1:39' — 0-81 2°89 3°375 324 + 0:55| + 0-92 1-24 
2-555 322 + 0-38! + 0-50 0-42 3-385 308 + 0-35} + 1-03 1-21 
2-571 306 + 1-25} + 0-55 1-91 3-400 323 + 1-12] + 2-37 7-41 
2-588 308 + 0-30] + 0-43 0-28 3-407 276 + 2-98] + 2-81 14-90 
2-600 312 + 1:02} — 0-39 1-25 3-412 348 + 1-27) — 3-98 15-53 
2-615 306 — 0-75| — 0-24 0-63 3-417 328 + 3:08) — 2-24) 15-84 
2:625 294 — 0-45| + 1-36 2-01 3-429 288 + 3-11; — 1-40 11-16 
2-643 296 + 0:95} — 0-62 1-27 3-444 310 + 0:09| — 0-99 1-03 


428 TIME-SERIES 
TABLE 30.9—continued. 


' Number Intensity ; Number Intensity 
Period |¢-years| A. | B. | N (A? +B2)| Period | of Years; A. | B. | WN (4? +B?) 
(Years). N. = — 300m (Years); 7, = a0 

3°455 304 + 0-55 | + 0-29 0-39 4:933 296 + 1:57; + 1-58 4-91 
3462 315 + 1-57] + 1-02 4-87 5-000 300 + 1-85} + 1-00 4-30 
3-500 308 + 1-20| — 0:94 2°38 5-067 304 — 0-05| + 3-98 16-09 
3-524 296 + 1-41; — 1-18 3°31 5-091 336 — 0-73) + 5-55 35-05 
3:538 322 + 0:50} — 1:45 2°53 5-100 306 + 5-71] + 2-98 42-34. 
3-556 320 + 0-02} — 0-43 0-20 5-111 322 + 5-70| + 0-29 34°91 
3-571 325 + 0-80] — 0-69 1-21 5-125 328 + 3:97/ + 2:90 26-38 
3:600 324 — 1:03] + 0-82 1-88 5-143 324 + 2:46) + 2-46 13-09 
3-619 304 + 1-18} + 1-23 2:94 5-200 312 + 0:02| + 0-30 0-10 
3:636 320 + 1-14) + 0-13 1-39 5-250 294 + 1-74} + 1-92 6:56 
3-643 306 — 0-16] + 0:27 0-10 5-333 320 + 0-71; — 4-46 21-72 
3:667 308 — 2-14; — 1:07 5:87 5-400 324 + 1:04] + 3-71 16:06 
3-679 309 + 0-34] — 1-90 3-83 5-415 325 + 4-27| + 1-90 23-66 
3-692 288 + 1:28} — 0-22 1-63 5-429 304 + 4:72| — 0-28 22-61 
3-700 296 + 0-90} — 0-59 1:18 5-455 300 + 1:37| — 3-73 15:76 
3-714 312 + 1-15; + 1-78 4-65 5-500 308 — 1-04} + 1-49 3:39 
3:727 287 — 0-45} — 1-65 2-72 5°555 300 + 2-40) — 0-68 6-23 
3-750 315 + 0-64] — 0-06 0-44 5-600 336 + 0-46) + 1-21 1-88 
3-778 306 — 1:17} — 0-68 1-86 5-667 306 + 5-31} — 1-97 32:72 
3-800 304 + 1:60} + 0-80 3-24 5-692 296 + 2:05| — 3-91 19-18 
3-833 322 — 1:12) — 1-63 4:17 5-714 320 + 0-35] — 2-13 4:97 
3°857 324 + 1:63] + 0-45 3-08 5-750 322 + 1-39} — 0-33 2°18 
3-888 280 — 0-15] + 0-66 0-43 5-800 290 + 3-55] — 2-75 19-47 
3-895 296 — 0-66 | + 1-00 1-42 5-846 304 + 0:00] — 2-29 5-35 
3-923 306 + 0-64; — 1-61 3:06 5-933 356 + 4-37) + 0-91 23-63 
3°962 309 — 0-67) + 1-74 3:59 6-000 300 — 3-50| — 0-12 12-29 
4-000 300 + 1:47) — 1-13 3-64 6-111 330 — 0-79) — 1-90 4-66 
4-077 318 + 0-57] — 0-26 0-41 6-143 301 + 0-74] — 2-96 9-32 
4-111 296 + 1:13) — 1-70 4:13 6:167 296 — 0:22| — 2-94 8-56 
4-143 290 — 0:50| + 0-23 0:30 6-200 310 — 2:02) — 3:38 16-02 
4-167 325 + 1-21} + 0:32 1-70 6-250 325 — 3:23| — 0-11 11-30 
4-173 322 + 0-66] — 1-46 PTT 6-286 308 — 1:72} — 0-59 3-41 
4-200 294 — 0-99} — 0-41 1-02 6-333 304 — 1-52] + 1-29 4:02 
4-250 323 + 0°50! — 2-73 8-32 6-400 320 + 0:80} + 2-74 8-71 
4286 300 — 0-65; + 0-79 1-04 6-500 312 + 0-69) — 0-73 0-94 
4-333 312 — 1-50} — 1-30 4-10 6-571 322 + 1:49] — 0:77 3:02 
4-353 296 — 2-85] — 0-24 8-05 6-667 320 + 0-25}; + 0-21 0-11 
4-364 288 — 2-98] + 0-75 9-07 6-727 296 + 0:08}; — 0-13 0-02 
4-375 315 — 2-47] + 0-87 719 6-750 324 — 0-20; — 1-66 3-01 
4-385 342 — 0:50) + 2-55 erg 6-800 306 + 0-23 | — 0-65 0-48 
4-400 308 — 1:38] + 3-27 12-89 6-909 304 + 0:58} + 2-56 7:00 
4-412 300 + 0-08] + 3-62 13-1] 6-933 312 + 1-68] + 2-01 7-15 
4-417 318 + 0-87] + 3:85 16-48 7:000 308 + 3-10| — 2-17 14-74 
4-429 310 + 1:80; + 2-41 9-32 7-143 300 + 1-83) — 1-86 6-79 
4-444 320 + 2-15] + 0-83 5-66 7:200 324 + 0-54) — 3:93 16-96 
4-471 304 + 0-91} + 0:79 1-48 7-333 308 + 1-52] — 2-81 10-46 
4-500 306 ae esi |) ae Mey 4-09 7400 296 — 2-33] — 2-72 12:65 
4-571 320 — 0-21] + 0:04 0-22 7417 356 + 1-50; — 4-01 21-72 
4-600 322 — 0-08] + 1-24 1-65 7-429 312 — 3:80| — 1-49 17-28 
4-667 336 + 0:19} + 0:93 1-00 7-500 315 + 0-17} + 1-50 2:40 
4-750 304 — 0-12} + 2-28 5-28 7-600 304 — 2-33} — 1:37 7-43 
4-800 288 + 2-44| + 1-08 6-84 7-667 322 — 1-46) — 2-61 9-57 
4-857 306 — 1-06! — 1:30 2-89 7-750 310 + 1-38} — 0-39 2-13 
4-888 312 — 1:80] + 2-11 8-00 7-857 330 — 0-50| + 0-28 0-36 


PERIODOGRAM ANALYSIS 429 
TABLE 30.9—continued. 


: Number Intensity , Number Intensit 
Fenod of Years); A. 183. N (4? + B?) EO oe Years! A. B. UNG (Al + B4), 
(Years). eee (ears). emp MS eas 

N. 300 N. 300 

8-000 312 — 3-96) + 1:34, 18-67 17-500 280 — 6:18) ~— 4:45 54:12 
8-091 356 + 4:32) — 0-98) 23°23 18-000 306 — 4:40) + 1-25 21-29 
8-200 287 + 1-62] — 0-64 2-90 18-500 296 — 1-46] + 2-25 7-10 
8-222 296 + 0-19! — 0-56 0-34 19-000 304 + 1-00} — 0-23 1-07 
8-333 325 + 0:21| + 0-91 0-95 19-750 316 — 4-73] — 1-59 26-25 
8-500 323 + 0:17| + 3-19 10-41 20-000 320 — 5:71| + 1-69 37-88 
8-667 312 + 2-51] — 1-01 7-59 21-000 294 + 0-78] + 2-61 7-28 
8-800 308 + 2-97| + 0-83 9°77 22-000 308 + 1:87] + 1-58 6-18 
9-000 306 — 1-51| — 0-57 2°65 23-000 322 — 2:45| — 1-43 8-61 
9-200 322 — 0-16} — 1-56 2°65 24-000 288 + 0-45) + 5:19 26-10 
9-333 336 — 0:74, + 0-64 1-08 24-667 296 +431; + 1-99 22-21 
9-500 304 + 1:08) + 1-07 2°26 25-000 325 + 3-86] — 0-19 14-94 
9-667 290 + 5-03| + 0-37 24-55 26-000 312 + 1-23] — 1-34 3°43 
9-750 312 + 4:46| — 3-56 33-89 27-000 324 + 0-50} — 0-33 0-38 
9-818 324 + 1-21| — 4-94 27-90 28-000 308 — 0-49] + 0-68 0-72 
10-000 320 — 1-19] — 0-83 2:25 29-000 290 + 1-08| — 2-12 5:46 
10-200 306 + 0-86| — 0-22 0:80 30-000 300 — 1-53] — 2-34 7-81 
10-250 328 — 0-69) + 1-10 1-84 31-000 310 — 1-98; + 0-13 4:06 
10-400 312 + 1:88] — 1-65 6-52 32-000 320 — 0:37] + 0-51 0-42 
10-500 294 + 2:46| — 1-82 9-19 33-000 330 + 0-96, — 0:78 1-68 
10-750 301 + 1:47) — 3-13 11-98 34-000 306 — 3:00| — 2-15 13-90 
10-800 324 + 1:00} — 4-75 25-48 35-000 280 — 4-64/ + 1-79 23°11 
11-000 308 — 3°85| — 4-26 33-84 36-000 288 — 1-65) + 4-85 23-29 
11-200 336 — 2-48) + 0:55 7-24 37-000 296 + 2:08] + 3-92 19-47 
11-500 322 — 1-32] — 0-66 2-34 38-000 304 + 2:99) + 0-56 9-37 
11-667 280 + 0:46} + 1-42 2-07 40-000 320 — 1-44; — 0-63 2°63 
12-000 312 — 2:47} — 4-04 23-30 41-000 328 — 1-93) + 0-93 5-01 
12-143 340 — 0:22) — 4-37 21-66 42-000 294 + 0-93] + 3-02 9-75 
12-333 296 — 2:44] + 2-74 11-43 44-000 308 + 3:00| — 0-14 9-27 
12-500 325 — 1-22) + 2-63 9-13 45-000 315 + 1:69| — 1-99 7-14 
12-667 304 + 2:28| + 5:19 32-58 46-000 322 + 0:16] — 2-27 5:58 
12-800 320 + 5-70| + 3-26 46-01 48-000 288 — 0-76] — 0-09 0-56 
12-875 309 + 6-46} + 0-77 43-58 50-000 300 + 1-83] + 2-19 8-14 
13-000 312 + 4:26| — 4-32 38-23 52-000 312 + 4:77| — 0-57 24-03 
13:333 320 + 0-40} + 0:37 0-32 53-000 318 + 4:22) — 2-60 26-08 
13-500 324 + 2-56] — 2-09 11-79 54-000 324 + 2-84) — 4:01 26-09 
13-667 328 + 3:49] — 1-34 15-28 55-000 330 + 3-54} — 3-30 25°82 
14-000 308 + 1:15; — 1:00 2°38 56-000 336 + 3-31) — 2-36 18-47 
14-500 290 — 3-78| — 0-18 13-82 58-000 290 + 3-89| + 1-49 16-82 
14-667 308 — 1-50} + 4:23 20-69 60-000 300 — 3-08| — 0-93. 10-32 
15-000 300 + 6-32} — 2-66 46:83. 62-000 310 — 1-62] + 0-39 2-88 
15-200 304 + 1-19} — 8-52 75-04 64-000 320 — 0-78] + 0-13 0-66 
15-250 305 — 0-28] — 8-65 76:17 66-000 330 — 0-56| — 0-56 0:69 
15-286 321 — 2°35| — 7-15 60-62 68-000 340 + 2-90) — 1-88 13-58 
15-333 322 — 3:89} — 6-55 62-29 70-000 280 — 0-69} — 0-16 0:47 
15-500 310 — 6:92] — 2-02 59-11 74-000 296 — 1-20] + 0-82 2°07 
16-000 320 — 1-46; + 4-52 24-02 76-000 304 — 0:66] + 1:17 1-83 
16-667 300 + 5-21] — 0-39 27-33 78-000 312 + 0-58/ + 1-26 2-00 
17-000 306 + 2-56} — 6-35 47:84 80-000 320 + 0-77| + 0-82 1:34 
17-333 312 — 3:04| — 6-65 54-55 84-000 336 + 0:26] + 0-69 0:62 


430 TIME-SERIES a 


An examination of the periodogram suggests the possibility of 20 periods, as follows :— 


: Corrected Intensity . Corrected Intensity 
Bees N (A? + B). cs N (A® + B?). 
an 300 300 
2-735 7:82 11-000 33°84 
3-417 15-84 12-000 23-30 
4-417 16-48 12-800 46:01 
5-100 42-34 15-250 76-17 
5-415 23-66 17-333 54:55 
5-667 32-72 20-000 37-88 
5-933 23-63 24-000 26-10 
7417 21-72 35-000 23-29 
8-091 23:23 54-000 26-09 
9-750 33-89 68-000 13-58 


This is evidently rather an embarrassing profusion of possibilities, and we cannot 
immediately accept all these periods as significant. Sir William discussed them in detail 
in the original paper and was inclined to attribute reality to 18 or 19 of them, partly on 
grounds which do not concern us here, such as the existence of weather oscillations with 
these “periods”. In particular, where a period had a high intensity he analysed the 
two halves of the series separately to see whether the periods persisted, finding that most 
of them did. 


30.45. An inspection of the correlogram of the series in Fig. 30.5 reveals a striking 
difference between the two methods of analysis. From the correlogram we should be 
inclined to suspect a mean period of about 15 years, corresponding to the peak of greatest 
intensity in the periodogram, with a subsidiary ripple of about 5 to 6 years’ period, corre- 
sponding to one or more of the peaks in the periodogram ;_ but of the other 18 periods there 
is no sign. The conclusion is inevitable that either the correlogram is insensitive or the 
periodogram is misleading. Having raised this highly important question we shall, unfor- 
tunately, have to leave it unsettled in part ; but we shall show that at least three-quarters 
of the periods thrown up for consideration by the periodogram are not significant. 


30.46. The calculation of the intensity S* depends on that of the quantities A and B 
of equations (30.67) and (30.68). Suppose in the first place that our trial period mu is an 
integer. We then write down the series in rows of yw, thus :— 


Uy U2 Us . . . U,, 
Un+1 Un+2 Up+3 es 8 0 Uey 
: (30.73) 
Wp—1) w+ Up—1) w+2 U(o—1) w+3 os 6 Un 
Totals my, Ms Ms << 772 


We continue writing down the rows until there are fewer than terms remaining, the 
extra terms being left out of account. The number py is then as near in multiples of uw 
as we can get to the number in the series n, and may be denoted by N. This array is some- 
times known as the Buys-Ballot table. 


PERIODOGRAM ANALYSIS 431 


We then form the sum— 


={m cos an + Mm. cone Sime « «| “tr 2h, COB “a ‘ . (30.74) 
py ye pe le 
and this is clearly the quantity A of (30.67) for the series of N terms. Similarly we have 
2% . 290) 
———— m,; sin — }. : : : + (30.75) 
pl pl 


If the trial period yw is a rational fraction ~ we write the series down in rows of y and 


proceed in the same way; and if it is irrational or is a number which gives a large value 
of » when expressed as a fraction, we take two convenient neighbouring values of ~ and 
interpolate in the periodogram. 


30.47. In actual practice we do not write down the array (30.73). The sums m 
may be formed on an adding machine by starting with uw, and then adding every wth mem- 
ber to give m,; then starting with uw, and adding every uth member to give m,, and so on. 
Or alternatively, the values may be written on cards, one for each member of the series, 
and the pack dealt into u heaps. The total of the m’s, together with any members left 
over, equals the sum of the series and provides a check on the work. 


Example 30.8 


Consider the Beveridge series of Table 30.1. For the trial period 2 we may take 300 
terms of the series, and m, (about zero mean) will be the sum of the values u,, ws . ~~ Uso 
and mz, will be the sum of the values with even subscripts. These sums are for the years 
1545 to 1844 inclusive, 


m, = 14,909 
My, = 14,893. 
The mean is 14,901, so that about the mean of the series 
m =+8 
Mm, = — 8. 


Qj 
Dw 


“a 


Now, for a trial period 2, sin vanishes and hence B = 0. For A we have (in our nota- 
tion, which gives different signs from Beveridge’s to A and B)— 


2 27 An 
i oo” 608 = + m, cos = 


= i ("3 
2 
Ee ie SY 
300 
Thus S? (corrected) = 05 JASN 


as shown in Table 30.9. 
13 ae 
For a trial period 2-600, we could take w = 5 and arrange the series in rows of 13, 


requiring 23 rows accounting for 299 values of the series. We may, however, save our- 
selves some arithmetic by taking 24 rows, a multiple of 4, occupying 312 observations. 


432 TIME-SERIES 


Or rather, we take 6 rows of 52, giving us the values for a trial period 52; then add m, 
tO M27, M, tO m5 and so on, giving the result we would have got by taking 12 rows of 26 
and hence providing the values for a trial period of 26 ; then we add again in the same way, 
and so on, obtaining successively the values of m required for trial periods of 13, 6-5, and 
3-25. Similarly, by multiplying the original 52 values of m by the respective values of 


. 52 
cos = and sin oe we get the values of A and B required for a trial period of 10" It is 


2 


thus evident that we can use the single set of 52 values of m to provide the required const 

2 52 52 
ie 3" 
observations are shown as N for the trial periods 2-080, 2-261, 2-364, 2-476, 2-600, 2-737, 
2-888, 3-250, 3-714, 4-333, 5-200, 6-500, 7-429, 8-667, 10-400, 13-000, 17-333, 26-000 and 
52-000. The arithmetic, though difficult enough, is not as laborious as appears at first sight. 


for trial periods and so forth. This is the main reason why, in Table 30.9, 312 


30.48. There is an interesting relation between the periodogram and the correlogram 
by which the latter, in theory, determines the former. We consider, as in 30.38, a function 
u (t) defined at every point of time in some range —h to h. Then 


= 1 ae 
a (p) + iB(p) =| uae 


h ° ch 
=e | cos pt u (t) dt + i] sin pt u (t) dt . - (30.76) 
h yi h —h 
corresponds to the sums of (30.67) and (30.68) and may be written A + 7B, where 


27 
— —e . ° e . ° e (30.77 
: ) 


It follows that the intensity S* is related to the Fourier transform of r (k) by the relation, 
derived from (30.63), 


S? = 24, (p) 
9 2h 
= | 7 ed, «ws (80.78) 


which is true also in the limit, subject to conditions of existence. Thus the intensity is, 
if r (k) exists over an infinite range, the quantity— 


2 rh 
lim — | r (k) cos kp dk, 
h Jon 


and if R (k) exists the parallel quantity— 
| R{k) cos kp dh. 


The periodogram is thus derivable from the autocorrelation function. Since the latter 
does not uniquely determine the series the periodogram will not do so either. 


Example 30.9 
Consider the autocorrelation function, which in present notation may be written 


B (kb) — Pisin (H+ vy) 


sin py 


SIGNIFICANCE OF A PERIODOGRAM 433 


This, as we have seen, represents the correlogram of an autoregressive series of the simple 
linear kind involving u,1, u,.,; and uw, We may write this as 


__ evte gin (LO + y) 
7 sin y , 


RB (k) q>0 
since p is less than unity. It is to be remembered that since R (— k) = R (k), the modulus 


of k is to be used when & is negative. 
We have 


S23 = iL e~ la! sin (0 + ) cos kp ak 


sin y 
=| e— !%! cos kO cos kp dk 


ae 
q* + (0 — p)® 


a 
Se + py 


This is the intensity in the periodogram of the series, p being the quantity a and not to 
be confused with our original damping factor p. 


2 
It is remarkable that, as ~ becomes large, S? tends to the constant value age 
that is to say, the periodogram tends to a fixed level, without peaks. From the analogy 
with the analysis of light-rays into colours (each colour corresponding to a particular har- 
monic), we may say that the periodogram develops a “‘ continuous spectrum”. In a 
very interesting chapter on periodogram analysis Davis (1941) has given a number of 


_examples exhibiting this kind of effect. 


Significance of a Periodogram 
30.49. Suppose that the values u,... wu, are random elements from a normal 
population with variance o?. Then the function 


Oe Qo) 
Aner ut, COS —~ 
a : Ub 
is normally distributed with variance 


4c? 27) 
var A = —_ 5 cos? —“ 
n a 


j=l 
ae ie Ow | (OO) 
nN 
and similarly 
2 
ar Bee. ee ee 
nN 


We also see that cov (A, B) = 0 so that A and B are independent. Hence the joint 
distribution of A and B is 


ae state i 2 =. . . (30.81 
dF = ;* exp { ae +B} adap (30.81) 


A.S.—VOL. II. FF 


434 TIME-SERIES 
Thus the distribution of S? = A? + B? is 


2 oe | ee | 
aF = aiexp ( a8 ) as Soe ek (80.82) 
2 
The probability that S? exceeds = in value is immediately obtainable as e~". 


30.50. This result is due to Schuster (1898), but it gives only the probability that 
a value of S? chosen at random will exceed a given value; whereas in the periodogram 
we deliberately pick out the biggest values for inspection. Walker (1914) pointed out that 
if e~*“ is small the probability that all of m independent values of S? should not exceed 
2 
— is (1 — e~")™, so the probability that at least one should exceed that amount is 
1 (ae) oe ew BORER 


Davis (1941) gives tables of this function. 


30.51. Both the Schuster and the Walker tests depend on a knowledge of o%. Since 


4c? 


the mean value of S? in (30.82) is a the usual procedure is to consider the test as a com- 


parison of S? with H (S*); but o? itself has to be estimated from the original data. 


30.52. Fisher (1929a) has given a test which avoids the inexactitude due to the 
estimation of o®. Ifv is the estimate and S? is the largest intensity, then the probability that 


=e el 


will exceed a given value is 


v 


ge () (129-2... 4 (— 13 ( ) (1 — mg), (30.85) 


where » = 3 (n — 1), m being the (odd) number of observations, and m is the greatest 
integer less than 1/g. The result was extended by Stevens (1939a)—-see also Fisher (1940a) 
and Finney (1941a). Davis (1941) also gives tables of this function. 


30.53. All the tests we have described are based on random normal variation in the 
original series ; but in practice nobody would embark on the labour of a periodogram 
analysis unless he had satisfied himself that the data were not random. It seems to me, 
therefore, that these tests are really off the main point, being tests based on a hypothesis 
which we have already rejected. They are not without their usefulness, however. We 
may assume with some confidence that if a particular intensity in the series is not shown 
as significant on the hypothesis of random variation, it is not significant when the series 
is systematic. What does not follow is that if one intensity is significant then others must 
be so, even if they exceed the significance values; for they are not independent of the 
significant value, at least for short series. What we ought to do, perhaps, is to extract. 
the component which is considered significant from the series and then analyse the 
remainder ; and so on as long as significant terms appear. But this is hardly a practical 


computational possibility. Tests of significance in the periodogram, as in the correlogram, 
remain undiscovered. 


LAG CORRELATION 435 


Example 30.10 


Let us examine the significance of the 20 periods of the Beveridge periodogram given 
in 30.44, 


2 
Sir William gave the value of a in his original paper as 5-898. Expressing the 


intensities as a multiple « of this amount, we find :— 


Period. K. Period. _K. 
2-735 i833) 11-000 5:74 
3-417 2-69 12-000 3-95 
4-417 2-79 12-800 7-80 
5-100 7:18 15-250 12-91 
5-415 4-01 17-333 9-25 
5-667 5-55 20-000 6-42 
5-933 4-01 24-000 4-43 
7417 3-68 35-000 3:95 
8-091 3-94 54-000 4-42 
9-750 5:75 68-000 2:30 


There are 305 trial periods in Table 30.9. Let us consider the probability that at least 
one of 305 independent values of « will exceed given values, that is to say, the probabilities 
given by (30.83). We find— 


K Probability. 

2 1-000 

+ 0-996 

6 0-531 

8 0-097 
10 0-014 


On this basis we should be inclined to attribute significance to the period 15-25, for which 
«x = 12-91. We have no right to be surprised that at least one value exceeds x = 6. If 
we take this value as the critical one, only the periods 5-100, 12-800, 15-250, 17-333 and 
20-000 would be significant, that is to say, five out of 20. 

Again, since e~> = 0-007, we should expect to find in 305 independent members two 
in excess of 5. Actually there are eight. But they are not independent and we cannot 
rely on this comparison to say that six are significant. On the whole, however, it looks 
as if at least three-quarters of the periods are not significant, and possibly more. The 
example will illustrate the difficulty of testing the significance of the periodogram as a whole. 


Lag Correlation 
30.54. The idea of serial correlation can be extended to the joint variation of two 
series. If we have two series wu (é), v (t) in standard measure, we may define the lag corre- 


lation of order k as 
“fives | einen) de - wk (30.88) 
where the integral includes summation in the case when the series are specified at equi- 


distant points of time. We note that in this case r (k) is not equal to r (— k) and r (0) 
is not unity. 


436 TIME-SERIES 


Table 30.10 shows the lag correlations between two series of English wheat prices and 
horse populations (for the original series see Kendall, 1944a). The data are shown as a lag 
correlogram in Fig. $0.10. 


TABLE 30.10 
Lag Correlations for Two Series of English Wheat Prices and Horse Populations (Deviations 
from a Simple Nine-Year Average). 


(Lhe order of the correlation is the number of years by which horse population lags behind wheat price, 
©.g. 74) is the correlation of wheat price with the horse population of ten years earlier.) 


Order of Order of 
Correlation Vie Correlation Tie 
\ 
— 10 — 0:22 1 — 0:24 
— 9 — 0-19 2 — 0°36 
— 8 — 0:24 3 — 0-12 
— 7 ' — 0:16 4 0-16 
— 6 — 0:09 5 0-17 
— 5 0-07 6 0:39 
— 4 0-27 3 0-36 
— 3 0:31 8 0:15 
— 2 0-41 9 — 0:16 
— 1 0-25 10 — 0-44 
0 — 0-12 


oe 
ja 


/| 
os Valor] 
ae INT ala 


Fie. 30.10.—Lag Correlation of Wheat Prices and Horse Populations (Table 30.10). 


NOTES AND REFERENCES \ 437 


The systematic appearance is unmistakable and we notice in particular that the maximum 
correlation occurs between the wheat price and the horse population of two years later. 
This bears the obvious explanation that when a farmer earns more he buys or breeds more 
horses ; but it does not follow logically that this must be so or that there need be any 
causal nexus between the two series. If two autoregressive series are oscillating with 
mean periods which are close together and only a short span of experience is available for 
scrutiny, then lag correlations of the damped sinusoidal type may appear, as it were, by 
accident. 


30.55. We have now reached the end of our account of the statistical analysis of 
time-series and the end of this book ; and the final words we have to say of the one will 
apply generally to the other. Much has been left unsaid, partly from lack of space, partly 
from deficiencies in the present state of knowledge, and partly from a desire not to over- 
burden the reader. We have not avoided mathematical analysis where it was necessary 
to advance the argument ; but we have insisted on the expression of results in numerical 
form and the necessity of experimental confirmation whenever it could be obtained. That 
there are gaps in the treatment we have given and unexplored branches of the subject 
to which we have barely referred are not entirely matters of regret; for the over-early 
and peremptory reduction of knowledge into arts and methods is one of the errors which 
Bacon cautioned us against more than 300 years ago. Much remains to be done; and this 
book will have served its purpose if the reader is left with the desire to do some of it himself. 


NOTES AND REFERENCES 


The theoretical aspects of the autoregressive series and of moving averages are dis- 
cussed in Wold’s book on The Analysis of Stationary Time-Series (1938a). The basic 
memoir is that by Yule (1927a) on sunspots. For applications to meteorology see Walker 
(1931) and to economics Kendall (1944a). Davis’s book on The Analysis of Economic Time 
Series (1941) contains a great deal of interesting material but should not be read uncritically. 
Two earlier papers by Yule (1921 and 1926) are also of interest. See also my paper on 
“The Analysis of Oscillatory Time-Series ” in the Journal of the Royal Statistical Society 
for 1945, a paper by Yule in the same journal, my brochure (in press) on ‘ Researches in 
Oscillatory Time-Series ”’, and a symposium introduced by Bartlett in the Supplement to 
the Journal for 1946. 

The classical work on periodogram analysis is that of Schuster (1898). The books 
by Brunt (1931) on The Combination of Observations and by Whittaker and Robinson 
(1940) on The Calculus of Observations contain useful introductory accounts ; and Davis’s 
book referred to above has an excellent chapter illustrated with an unusual number of 
examples. Papers by Crum (1923) and Greenstein (1935) are of interest. The papers by 
Sir William Beveridge (1921, 1922) on wheat prices and rainfall have been justly described 
by Davis as a heroic piece of periodogram analysis. Tables facilitating the calculation 
of intensities were published by Turner (1913), and more complete tables will be given in 
my brochure referred to above. See also the book by Stumpff (1937). 

Various short-cut methods of periodogram analysis have been proposed by several 
authors, e.g. Oppenheim (1909), Bruns (1921) and Alter (1933, 1937); but their value is 
problematical. There is a useful memoir by Bartels (1935) which is worth studying. 


438 TIME-SERIES 


EXERCISES 


30.1. For the autoregressive series 
Ups + Ay, + buy = e429 
show that if « is a random variable and the series is long, 
varu _ i} se 0p 
vare (1 —b) {(1 +6)? —a?} 
and hence that the variance of the generated series may be much greater than that of 
e itself. 


30.2. For the autoregressive series of the previous exercise use the relation 
Teig + W414 + br, = 0, k>-l1 


to derive the relation 
p* sin (k0 + y) 


sin y 


30.3. If the estimated coefficients a’ and b’ in the autoregressive scheme are reduced 
in the manner of 30.32 by a superposed error, show that 


(Yule, 1927a.) 


30.4. Show that if, in the autoregressive scheme of Exercise 30.1, b = 1, the series 
becomes undamped and the correlogram reduces to a simple harmonic. Examine the 
effect on the solution (30.23). 


30.5. If any series has fitted to it a series generated by the scheme of Exercise 30.1, 
a and b being any constants, show that for the serial correlations of the residuals, say o,, 
we have 


_ (1 + a* + 6) p, + @ (1 + 9) (pete + pea) 7 (peta Pas): 
+ 1 + a? + 6? + 2a (1 + b) pi + 2bp, 


OK 


30.6. Show that the series with an autocorrelation function 


r (k) 


__ sin Ak 


Ak 


has a periodogram which is zero for periods less than = and has ordinate ~ 


7 7 for periods greater 


uc. : 
than z Le. has a continuous spectrum. 


30.7. In equation (30.71), noting that the dominant term vanishes for « — co mull 
n 


bd 


where m is an integer, show that for such a “ vanishing ”’ trial period u 


b=a (0 ++ - a) approximately, 


EXERCISES 439 


. ‘ ; ane 2A? : 
Hence the width of a peak in the periodogram is approximately a5 and the main peak 


will be flanked by smaller peaks of the same width. (This “ side-band ” effect is another 
complication in the interpretation of the periodogram, but not apparently a very serious 
one.) 


30.8. If a series of values u,... uU, is supplemented by a number of zeros as 


Uo, U_y, U_p s+» Uns, Unyy, etc., as far as is necessary, and the resulting series differenced, 
show that 


= P.(2) ~27,(,2,) 427.(,2,)-... 22 ye, 


ad) 


where tT; is the sum of squares of jth differences and P; = 2% z,.;- Hence show that 


the arithmetic of serial correlation may be related to that ae ys variate-difference method, 
and vice-versa. 


30.9. Show that the serial correlations of a long series obtained by differencing a 
random series m times are given by 
—1)...(m—k+1) 
rhein (m 
Se ) (m+1)...(m+k) 
and hence that the correlogram of such a series oscillates. 


(Yule, 1921.) 
30.10. The Whittaker periodogram. Writing 


var m 


n? (uu eee 


where var wu is the variance of the series and var m is the variance of the sums m of (30.73), 
show that if 


Uy; ene ieee Dy b;, 
where 6; is uncorrelated with periodic terms, iw 


a? y? sin® —— 


2 = 
me) da? + var b 


Hence show that, in the neighbourhood of A, the graph of 7 as ordinate with u as abscissa 


2)2 
(Whittaker’s periodogram) has a peak of breadth == flanked by smaller peaks. 


(Whittaker, Month. Notes R. Astr. Soc., 1911, 71; cf. Whittaker and Robinson, Calculus of 
Observations.) 


APPENDIX A 
ADDENDA TO VOLUME If 


(1) Frequency and Distribution Functions ’ 

An interesting paper by Burr (1942) considers the possibility of fitting elementary 
mathematical functions, not to the frequency function as has been the almost universal 
practice hitherto, but direct to the distribution function. This approach seems to merit 
further attention. In general, the distribution function has fewer analytical peculiarities 
than the frequency function—for instance, it cannot be infinite—and in applications to 
sampling it is the former which is nearly always required. The frequency function can, 
of course, be derived from the distribution function to a close approximation by differ- 
encing, or differentiation, processes which are usually easier to carry out than the inverse 
processes of integration. 


(2) Extension of the Carleman Criterion (4.22) 


Cramér and Wold (1936) have extended Carleman’s criterion for uniqueness in the 
problem of moments in the following form :— 
If 
A; = Mino... + Mow... + Moo... +e > 
the distribution is completely determined by its moments if 


1 
pai 1 
me 


diverges. It israther interesting that the criterion is independent of the product-moments. 


(3) Convergence of Series Leading to Standard Errors 


The usual type of expansion in differentials, exemplified in 9.6, raises a point of mathe- 
matical difficulty in that the differentials themselves and the remainder terms, though 
usually small, may sometimes be large for sampling reasons, however large the sample. 
The necessary rigorisation of the process has been given by Derkson (1939) in terms of the 
notion of stochastic convergence, that is to say, a sort of statistical convergence in which 
the series converges nearly always in a precisely defined sense. 


(4) Moments of Moments for Finite Populations 


The formulae for moments of the mean and variance in samples from a finite population 
were stated without proof in 11.26. It is obvious that if in these results we let N , the 
population number, tend to infinity, we obtain the formulae for sampling from an infinite 
population. Irwin and I (1944) have recently shown that the process may be reversed 
and the formulae for the finite case derived from those for the infinite case. This offers 
the simplest and most direct method of deriving the formulae known to me. Reference 
may also be made to Sukhatme, “ On Bipartitional Functions ” (Phil. Trans., 1938, A, 
237, 375) and “ Moments and Product-Moments of Moment-statistics for Samples of the 
Finite and Infinite Populations ” (Sankhyd, 1944, 6, 363). 


440 


APPENDIX A 441 


(5) Tied Ranks 


In the treatment of rank correlation in Chapter 16 it was assumed that ranking was 
always possible ; but in practice cases occur when two or more individuals “ tie” and the 
ranks have to be equalised in some way. This possibility introduces the most intractable 
complications into theoretical work, but sometimes ties occur so frequently that a systema- 
tic method of dealing with them is necessary. The subject has been reviewed and recon- 
sidered by Woodbury (1940) and more recently by myself (Biom., 1945, 33, part 3). 


(6) Coefficients of Rank Correlation 

Daniels (1944) has recently unified the theory of rank correlation by showing that 
Spearman’s p, my t and the product-moment coefficient are particular cases of a general 
coefficient. In particular he has demonstrated the formula for the covariance of p and t 
given in 16.24 as very probably true. 


APPENDIX B 
BIBLIOGRAPHY 


The following Bibliography has no pretensions to completeness in spite of its length. 
It contains about half the titles recorded in my own notes, which themselves are doubtless 
far from comprehensive. Nevertheless, I hope it will be useful to those readers who want 
to take their studies of particular subjects somewhat further. By consulting the references 
given here and following up the references which they themselves provide, it should be 
possible for the reader to acquaint himself with most of what is known, or at least with 
what is worth knowing, about a particular topic. 

The names of authors are not included in the Index (pages 504 ff.) unless they occur 
in the text, since the Bibliography itself is arranged alphabetically under authors’ names. 
The subjects, however, are indexed, and anyone wishing to consult references on a par- 
ticular topic should refer in the first place to the Index, which in turn will refer to the 
authors who have dealt with the matter in question. 

In general the Bibliography contains only references to theoretical papers ; applica- 
tions and illustrative material are included only when some theoretical point is involved. 
Papers which have been superseded by later work are omitted, except where they have 
a historical interest. 

In compiling this material I have been particularly indebted to the valuable periodical 
reviews of Recent Advances in Mathematical Statistics by Irwin, Hartley and others in 
the Journal of the Royal Statistical Society: 1932, 95, 498; 1934, 97, 114; 1935, 98, 
88; 1936, 99, 714; 1938, 101, 394; 1939, 102, 406; and 1940, 103, 534. 

Many papers written since 1939 are included, but some journals are not available in 
war-time so that foreign work published after the entry of various countries into the war 
may be incompletely represented. Where possible, the references have been checked 
against the original publications, but here also I have had to rely on second-hand references 
in cases where the original papers were inaccessible. 

Note.—Names beginning with de, del, le, St., van, von, etc., are entered under those 
titles, i.e. the order is strictly alphabetical. 


ABERNETHY, J. R. (1933). On the elimination of systematic errors due to grouping. Ann. 
Math. Stats., 4, 263. 

ACKERMANN, W. G. (1939). Eine Erweiterung des Poissonschen Grenzwertsatzes und ihre 
Anwendung auf die Risikoprobleme in der Sachversicherung. Schrift. math. Inst. Berlin, 
pall ie 

Apcock, R. J. (1878). A problem in least squares. Analyst, 5, 53. 

A1TKEN, A. C., and OprENHEIM, A. (1931). On Charlier’s new form of the frequency function. 
Proc. Roy. Soc. Edin., 51, 35. 

AiTKkEN, A. C. (1931). Some applications of generating functions to normal frequency. Quart. J. 
Maths., 2, 130. 

Aitken, A. C. (1932). On the orthogonal polynomials in frequencies of Type B. Proc. Roy. 
Soc. Edin., 52, 174. 

AITKEN, A. C. (1933a). Qn the graduation of data by the orthogonal polynomials of least squares. 
Proc. Roy. Soc. Edin., 53, 54. 

AITKEN, A. C. (19336, c). On fitting polynomials to weighted data by least squares. Proc. Roy. 
Soc. Edin., 54,1; and: On fitting polynomials to data with weighted and correlated 
errors. Jbid., 54, 12. 

442 


BIBLIOGRAPHY 443 


Aitken, A, C. (1935a). On least squares and linear combination of observations. Proc. Roy. 
Soc. Edin., 55, 42. ; 

AITKEN, A. C., and Gonay, H. T. (19356). On fourfold sampling with and without replacement. 
Proc. Roy. Soc. Edin., 55, 114. 

Arrxren, A. C. (1937a, 6, 1938). Studies in practical mathematics: I. The evaluation with 
applications of a certain triple product matrix. Proc. Roy. Soc. Hdin., 57,172; II. The 
evaluation of the latent roots and latent vectors of a matrix. Jbid., 57, 269; III. The 
application of quadratic extrapolation to the evaluation of derivatives and to inverse 
interpolation. Jbid., 58, 161. 

Aitken, A. C., and Sttverstons, H. (1942). On the estimation of statistical parameters. Proc. 
Roy. Soc. Edin., 61, 186. 

AuLaNn, F. E. (1930). The general form of the orthogonal polynomials for simple series with 
proofs of their simple properties. Proc. Roy. Soc. Edin., 50, 310. 

AuLAN, F. E., and WisHart, J. (1930). A method of estimating the yield of a missing plot in 
field experimentation work. J. Agr. Sci., 20, 399. 

ALLEN, H. V. (1938). A theorem concerning the linearity of regression. Stat. Res. Mem., 2, 60. 

ALLEN, R. G. D. (1939). The assumptions of linear regression. Economica, 6, 191. 

Aut, F. L. (1942). Distributed lags. Hconometrika, 10, 113. 

Autser, D. (1924). Application of Schuster’s periodogram to long rainfall records, beginning 
1748 Monthly Weather Rev., 52, 479. 

AuterR, D. (1925). Equations extending Schuster’s periodogram. Asir. J., 36, No. 850. 
AutTsER, D. (1926a). An examination by means of Schuster’s periodogram of rainfall data from 
long records in typical sections of the world. Monthly Weather Rev., 54, 44. 

AuTER, D. (19266). The criteria of reality in the periodogram. Monthly Weather Rev., 54, 57. 

ALTER, D. (1933). An extremely simple form of periodogram analysis. Proc. Nat. Acad. Sci., 
19, 335: 

Aurir, D. (1937). A simple form of periodogram. Ann. Math. Stats., 8, 121. 

AuTER, D. (1939). Correction of sample moment bias due to lack of high contact and to histogram 
grouping. Ann. Math. Stats., 10, 192. 

‘ ALUMNUS ’” (1932). A comparison of the effect of raintall on spring and autumn-dressed wheat 
at Rothamsted Experimental Station, Harpenden. J. Agr. Sci., 22, 101. 

AMBARZUMIAN, G. (1937). Verteilungskurven der Wahrscheinlichkeiten, welche im Limit die 
Verteilungskurven von Pearson ergeben. C.R. Acad. Scr. U.S.S.R., 16, 251. 

AnpErRSON, O. (1914). Nochmals tiber ‘ The elimination of spurious correlation due to position 
in time and space.’ Biom., 10, 269. 

ANDERSON, O. (1923). Uber ein neues Verfahren bei Anwendung der ‘ Variate-difference ’ Methode. 
Biom., 15, 134. Corrigenda, 15, 423. 

ANDERSON, O. (1926). Uber die Anwendung der Differenzenmethode (Variate-difference method) 
bei Reihenausgleichungen, Stabilitatsuntersuchungen, und  Korrelationsmessungen. 
Biom., 18, 293. 

AnpERSON, O. (1927). On the logic of the decomposition of statistical series into separate com- 
ponents. J.R.S.S., 90, 548. 

ANDERSON, O. (1929). Die Korrelationsrechnung in der Konjunkturforschung. Schroeder, 
Bonn. 

Anperson, O. (1935). EHinfiihrung in die mathematische Statistik. Springer, Wien. 

Anperson, P. H. (1942). Distributions in stratified sampling. Ann. Math. Stats., 13, 42. 

Anverson, R. L. (1942). Distribution of the serial correlation coefficient. Ann. Math. Stats., 
13k 

Anperson, T. F. (1935). Some further notes upon experiments with actuarial functions and 
Fourier’s series. J. Inst. Act., 67, 31. 

AnpERsson, W. (1932). Researches into the theory of regression. Medd. Lunds Astr. Obs., Series 2, 


No. 64. 


444 BIBLIOGRAPHY 


Anpersson, W. (1934). On a new method of computing non-linear regression curves. Ann. 
Math. Siats., 5, 81. : ite 

Anpré, D. (1884). Etude sur les maxima, minima et séquences des permutations. Ann. Ec. 
Norm. Sup., (3), 1, 121. 

Arotan, L. A. (1937). The Type B Gram-Charlier Series. Ann. Math. Stats., 8, 183. 

Arotan, L. A. (1941). A study of R. A. Fisher’s z-distribution and the related F-distribution. 
Ann. Math. Stats., 12, 429. 

Aroran, L. A. (1943). A new approximation to the levels of significance of the chi-square dis- 
tribution. Ann. Math. Stats., 14, 93. 

Aumann, G. (1934-1935). Aufbau von Mittelwerten mehrere Argumente. Math. Ann., 109, 
935, and 111, 713. 

Ayyancar, A. A. K. (1934). Note on the recurrence formulae for the moments of the point 
binomial. Biom., 26, 262; and: Note on the incomplete moments of the hyper- 
geometrical series. IJbid., 26, 264. 

Ayyanear, A. A. K. (1938). On the semi-invariants of two variates and their additive property. 
Sankhya, 4, 85, and J. Indian Math. Soc., 3, 1. 


Bacon, H. M. (1938). Note on a formula for the multiple correlation coefficient. Ann. Math. 
Stats., 9, 227. 

Barey, A. L. (1931). The analysis of covariance. J. Am. Stat. Ass., 26, 424. 

Baker, G. A. (1930a). Transformations of bimodal distributions. Ann. Math. Stats., 1, 334. 

Baker, G. A. (19300). The significance of the product-moment coefficient, with special reference 
to the marginal distributions. J. Am. Stat. Ass., 25, 387. 

Baker, G. A. (1930c). Random samples from non-homogeneous populations. Metron, 8, 
No. 3, 67. 

Baker, G. A. (1930d). Distribution of the means of samples of nm drawn at random from a popula- 
tion represented by the Gram-Charlier Series. Ann. Math. Stats., 1, 199. 

Baker, G. A. (1931). The relation between the means and variances, means squared and variances 
in samples from combinations of normal populations. Ann. Math. Stats., 2, 333. 

Baker, G. A. (1932). Distribution of the means divided by the standard deviations of samples 
from non-homogeneous populations. Ann. Math. Stats., 3, 1. 

Baker, G. A. (1934). Transformation of non-normal frequency-distributions into normal dis- 
tributions. Ann. Math. Stats., 5, 113. 

Baker, G. A. (1935). Note on the distribution of the standard deviation and second moments 
from a Gram-Charlier distribution. Ann. Math. Stats., 6, 127. 

Baker, G. A. (1936). The probability that the mean of a second sample will differ from the 
mean of a first sample by less than a certain multiple of the standard deviation of the 
first sample. Ann. Math. Stats., 7, 197. 

Baker, G. A. (1937). Correlation surfaces of two or more indices when the components of the 
indices are normally distributed. Ann. Math. Stats., 8, 179. 

Baker, G. A. (1938). The probability that the standard deviation of a second sample will differ 
from the standard deviation of a first sample by a certain multiple of the first sample. 
Metron, 13, No. 3, 49. 

Baker, G. A. (1940). A comparison of Pearsonian approximations with exact sampling dis- 
tributions of means and variances. Ann. Math. Stats., 11, 219. 

Baker, G. A. (1941). Tests of homogeneity for normal populations. Ann. Math. Stats., 12, 233. 

Barpacxt, 8., and Fisner, R. A. (1936). A test of the supposed precision of systematic arrange- 
ments. Ann. Hug. Lond., 7, 189. 

Barnarp, M. M. (1935). The secular variations of skull characters in four series of Egyptian 
skulls. Ann. Lug. Lond., 6, 352. 

Barnarp, M. M. (1936). An enumeration of the confounded arrangements in the 2 x 2 x 2 
factorial designs. Supp. J.R.S.S., 3, 195. 


BIBLIOGRAPHY 445 


BaRtTELs, J. (1935). Zur Morphologie geophysikalischer Zeitfunktionen. Sitz. Berl. Akad. Wiss., 
139. 

Bartxy, W. (1943). Multiple sampling with constant probability. Ann. Math. Stats., 14, 363. 

Bartett, M. 8. (1933a). On the theory of statistical regression. Proc. Roy. Soc. Edin., 53, 260. 

Bartwett, M. 8. (19335). Probability and chance in the theory of statistics. Proc. Roy. Soc., 
A, 141, 518. 

Bartuett, M. 8. (1934a). The problem in statistics of testing several variances. Proc. Camb. 
Phil. Soc., 30, 164. 

Bartiett, M. 8. (19346). The vector representation of a sample. Proc. Camb. Phil. Soc., 30, 
327. 

Bartiett, M. 8. (1935a). The effect of non-normality on the ¢t-distribution. Proc. Camb. Phil. 
Sock, ole 223: 

Barriett, M. §S. (19356). Contingency table interactions. Supp. J.R.S.S., 2, 248. 

Barttett, M. S. (1935c). Some aspects of the time-correlation problem in regard to tests of 
significance. J.R.S.S., 98, 536. 

Bartuett, M. S. (1935d). An examination of the value of covariance in dairy-cow nutrition 
experiments. J. Agr. Sci., 25, 238. 

Bartuett, M. 8. (1936a). The information available in small samples. Proc. Camb. Phil. Soc., 
32, 560. 

Barrett, M.S. (19365). Statistical information and properties of sufficiency. Proc. Roy. Soc., 
A, 154, 124. 

Bartuett, M. S. (1936c). A note on the analysis of covariance. J. Agr. Sci., 26, 488. 

Bartuetr, M. 8. (1936d). Square-root transformations in the analysis of variance. Supp. 
J RISS.,, 3968. 

Bartuett, M. 8S. (1936). Some notes on insecticide tests in the laboratory and in the field. 
Supp. J.R.S.S., 3, 185. 

Barrett, M. S. (1937a). Sub-sampling for attributes. Supp. J.R.SS., 4, 131. 

Barttett, M.S. (19375). Note on the derivation of fluctuation formulae for statistical assemblies. 
Proc. Camb. Phil. Soc., 33, 390. 

BaRTLeETT, M. S. (1937c). Properties of sufficiency aud statistical tests. Proc. Roy. Soc., A, 
160, 268. 

BartLett, M. 8. (1937d). Some examples of statistical methods of research in agriculture and 
applied biology. Supp. J.R.SS., 4, 137, i 

Bartiett, M. §. (1937e). The statistical conception of mental factors. Brit. J. Psych., 28, 97. 

Bartuert, M. 8. (1938a). The approximate recovery of information from replicated experiments 
with large blocks. J. Agr. Sct., 28, 418. 

BaxRtwett, M. S. (1938). The characteristic function of a conditional statistic. J. Lond. Math. 
Soe, 13, 62: 

Bartitett, M. S. (1938c). Further aspects of the theory of multiple regression. Proc. Camb. 
Phil. Soc., 34, 33. 

BartuetTt, M. S. (1939a). Complete simultaneous fiducial distributions. Ann. Math. Stats., 
10, 129. 

Bart ett, M. 8. (19396). A note on tests of significance in multivariate analysis. Proc. Camb. 
Phil. Soc., 35, 180. 

Bartuett, M. S. (1939c). The standard errors of discriminant function coefficients. Supp. 
J RIS. 677169. 

Bartvert, M. S. (1940). A note on the interpretation of quasi-sufficiency. Biom., 31, 391. 

Barrier, M. S. (1941). The statistical significance of canonical correlations. Biom., 32, 29. 

Baten, W. D. (1931). Corrections for the moments of a frequency distribution in two variables. 
Ann. Math. Stats., 2, 309. 

Baten, W. D. (1933a). Frequency laws for the sum of n variables which are subject to given 
frequency laws. Metron, 10, No. 3, 75. 


446 BIBLIOGRAPHY 


Baten, W. D. (19332). Sampling from many parent populations. Tokohu Math. Journ., 36, 206. 
Baten, W. D. (1934). The probability law for the sum of 2 independent variables, each subject 
to the law (1/2h) sech (tz/2h). Bull, Am. Math. Soc., 40, 284. 
Battry, I. L. (1942). On the problem of multiple matching. Ann. Math. Stats., 13, 294. 
Bayes, T. (1763). An essay towards solving a problem in the doctrine of chances. Phil. Trans., 
Boe orl: 
Bratz, F. 8. (1937). On the polynomials related to the differential equation 
1 dy A) + a, 2 aeN, 
ydx b+b,2+6,2% D 
Ann. Math. Stats., 8, 205. 
BEALL, G. (1939). Methods of estimating the population of insects in a field. Biom., 30, 422. 
BEALL, G. (1942). The transformation of data from entomological field experiments so that the 
analysis of variance becomes applicable. Biom., 32, 243. 
Brox, E. (1936). Existenzbeweise zur Wahrscheinlichkeitstheorie. Math. Zeit., 41, 222. 
Brcxer, R., Puant, H., and Runes, I. (1930). Anwendung der mathematischen Statistik auf 
Probleme der Massenfabrikation. Springer, Berlin. 
BEwReEns, W. V. (1929). Ein Beitrag zur Fehlerberechnung bei wenigen Beobachtungen. Landw. 


Pe ce an Cosi. Su una teoria astratta del calcolo della probabilita. Giorn. Ist. Ital. 

BENINI, Wi. nance di statistica metodologica. Unione Tipografica Editrice Torinese, 

=e (1920). The theory of measurement of changes in the cost of living. J.R.S.S., 
B55 

BERGE, ma ie A note on a form of Tchebycheff’s theorem for two variables. Biom., 
29, 405. 


Brrestrom, 8. (1918). Sur les moments de la fonction de correlation normale de n variables. 
OT U2 Wii 

Berson, J. (1930). Bayes’ theorem. Ann. Math. Stats., 1, 42. 

Berkson, J. (1938). Some difficulties of interpretation encountered in the application of the 
chi-square test. J. Am. Stat. Ass., 33, 526. 

Bernovuinu, J. (1713). Ars coniectandi. (A German translation in Ostwald’s Alassiker der 
Exakten Wissenschaften, Nos. 107 and 108.) 
Bernstein, F. (1932). Die mittleren Fehlerquadrate und Korrelation der Potenzmomente und 
ihre Anwendung auf Funktionen der Potenzmomente. Metron, 10, No. 3, 3. 
BERNSTEIN, I*. (1937). Regression and correlation evaluated by a method of partial sums. Ann. 
Math. Stats., 8, 77. 

Bernstein, 8. (1927). Sur extension du théoréme limite du calcul des probabilités aux sommes 
de quantités dépendantes. Math. Ann., 97, 1. 

BERNSTEIN, 8. (1936). Détermination d’une limite inférieure de la dispersion des sommes de 
grandeurs liées en chaine singuliére. Rec. Math. Moscou, 1, 29. 

Bernstein, 8. (1937). Sur quelques modifications de l’inégalité de Tchebycheff. C.R. Acad. 
Sci. U.SS.R., 17, 279. 

Berrrand, J. L. F. (1889). Calcul des probabilités. Gauthier-Villars, Paris. 

Brsicovitcn, A. 8. (1932). Almost Periodic Functions. Cambridge University Press. 

Busson, L. (trans. and abridged by E. W. Woolard) (1920). On the comparison of meteorological 
data with chance results. Monthly Weather Rev., 48, 89. 

Brveripa#, Sir W. H. (1921). Weather and harvest cycles. Econ. J., 31, 429. 

Brveripes, Sir W. H. (1922). Wheat prices and rainfall in Western Europe. J.R.S.S., 85, 412. 

Buatracuarya, D. P., and Narayan, R. D. (1942). Moments of D®-statistic for populations 
with unequal dispersions. Sankhya, 5, 401. 

Buarracnarya, K. N. (1943). A note on twofold triple systems. Sankhya, 6, 313. 


BIBLIOGRAPHY 447 


Bitnam, E. G. (1926). Correlation coefficients. @Q. J. Roy. Met. Soc., 52, 172. 

Bryeuam, M. D. (1941). A new method for obtaining the inverse matrix. J. Am. Stat. Ass., 
36, 530. 

Bisuop, D. J. (1939). On a comprchensive test for the homogeneity of variances and covariances 
in multivariate problems. Biom., 31, 31. 

Bisuop, D. J., and Nair, U.S. (1939). A note on certain methods of testing for the homogeneity 
of a set of estimated variances. Supp. J.R.S.S., 6, 89. 

Bispuam, J. W. (1922). Note on a heterotypic frequency function. J.R.S.S., 85, 488. 

Bispuam, J. W. (1920, 1923). An experimental determination of the distribution of the partial 
correlation coefficient in samples of thirty. Proc. Roy. Soc., A, 97, 1920, and Meéron, 
2, 684, 1923. 

Buakeman, J. (1905). On tests for linearity of regression in frequency-distributions. Biom., 
4, 332. 

BLAKEMAN, J., and PEarson, K. (1906). On the probable error of the coefficient of mean square 
contingency. Biom., 5, 191. 

Buss, C. I. (1935). The calculation of the dosage-mortality curve. Ann. App. Biol., 22, 134; 
and: The comparison of dosage-mortality data. Jbid., 22, 307. 

Briss, C. I. (1937). The calculation of the time-mortality curve. Ann. App. Biol., 24, 816. 

Buiss, C. I. (1938). The transformation of percentages for use in the analysis of variance. Ohio 
J. Sci., 38, 9. 

Butme., H. (1939). Bemerkungen tiber die Sheppardsche Korrektur. Arch. math. Wirtsch.- wu. 
Sozialforschung, 5, 39. 

Boas, R. P., and Smrrurss, F. (1937). On the characterisation of a distribution function by its 
Fourier transform. Am. J. Maths., 60, 513. 

Bocuner, 8., and Jessen, B. (1934). Distribution functions and positive definite functions. 
Ann. Maths., 35, 252. 

Bocuner, 8. (1936). A converse of Poisson’s theorem in the theory of probability. Ann. Maths., 
37, 816. 

Bocouner, 8. (1937). Stable laws of probability and completely monotone functions. Duke 
Math. J., 3, 726. 

Bo6pEwantT, G. T. (1936). Zum Momentproblem fiir das Intervall [0, 1]. Math. Zeit., 40, 426. 

Bour, H. (1925). Zur theorie fast periodischer Funktionen. Acia Math., 45, 29. 

BonFERRONI, C. (1933). Sulla probabilit& massima nello Schema di Poisson. Gorn. Ist. Ital. 
Ait., 4, 109. 

BonFrerROoNnT, C. (1939). Di una estensione del coefficiente di correlazione. Giorn. degli Economisti, 
Nov.-Dec., p. 7. 

Boret, E. (editor) (1925 and subsequently). Traité du calcul des probabilités et de ses applications. 
Gauthier-Villars, Paris. 

Borst, E. (1933). Sur un probléme élémentaire de probabilités et la quasi-périodicité de certains 
phénoménes arithmétiques. Comptes rendus, 196, 881. 

Boret, E. (1937). Sur Vimitation du hasard. Comptes rendus, 204, 203. 

Bore, E. (1939). Sur une interprétation des probabilités virtuelles. Comptes rendus, 208, 
1369 ; and: Sur certains problémes de répartition et les probabilités virtuelles. Ibzd., 
208, 1177. 

Bosz, A. N. (1941). Some problems of field operations in labour inquiries. Sankhya, 5, 229. 

Bosg, C. (1943). Note on the sampling error in the method of double sampling. Sankhya, 
6, 329. 

Boss, R. C. (1934). On the application of hyperspace geometry to the theory of multiple cor- 
relation. Sankhyd, 1, 338. 

Bosz, R. C. (1936a). On the exact distribution and moment-coefficients of the D?-statistic. 
Sankhya, 2, 143. 

Boss, R. C. (1936). A note on the distribution of differences in mean values of two samples 


448 BIBLIOGRAPHY 


drawn from two multivariate normally-distributed Repeaee, and the definition of 
the D?-statistic. Sankhyd, 2, 379. 

Bossz, R. C. (19382). On the distribution of the means of samples ei from a Bessel function 
population. Sankhyd, 3, 262. 

Boss, R. C. (1938). On the application of Galois fields to the problem of construction of Hyper- 
Graeco-Latin squares. Sankhyd, 3, 323. 

Boss, R. C., and Roy, S. N. (1938c). The distribution of the studentised D?-statistic. Sankhyd, 


4, 19. 

Bosz, R. C. (1939). On the construction of balanced incomplete block designs. Ann. Hug. 
Lond., 9, 353. 

Bosz, R. C., and Narr, K. R. (1939). Partially balanced incomplete block designs. Sankhya, 
4, 337. 


Boss, R. C., and Roy, 8. N. (1940). The use and distribution of the studentised D?-statistic 
when the variances and covariances are based on & samples. Sankhyd, 4, 535. 

Boss, R. C., and KisHen, K. K. (1941). On the problem of confounding in general symmetrical 
factorial designs. Sankhya, 5, 21. 

Boss, R. C. (1942a). A note on the resolvability of balanced incomplete block designs. Sankhya, 
6, 105. 

Boss, R. C., and Narr, K. R. (19426). On complete sets of Latin squares. Sankhya, 5, 361. 

Boss, S. N. (1935). On the complete moment coefficients of the D?-statistic. Sankhya, 
2, 385. 

Boss, 8S. N. (1937). On the moment coefficients of the D?-statistic and certain integral and dif- 
ferential equations connected with the multivariate normal population. Sankhyd, 
3, 105. 

Boss, 8. 8. (1934a). Tables for testing the significance of linear regression in the case of time- 
series and other single-valued samples. Sankhyd, 1, 277. 

Bossz, 8. 8. (19345). A note on the mathematical expectation of the value of the regression 
coefficient. Sankhya, 1, 432. 

Bosg, 8. 8. (1935). On the distribution of the ratio of variances of two samples drawn from a 
given normal bivariate correlated population. Sankhyd, 2, 65. 

Bosz, 8. 8. (1938a). On a Bessel function population. Sankhyd, 3, 253. 

Boss, 8. S. (19380). Relative efficiency of regression coefficients estimated by the method of 
finite differences. Sankhya, 3, 339. 

Bosz, 8. 8., and Manaranosis, P. C. (1938a). On the exact test of association between the 
occurrence of thunderstorm and abnormal ionisation. Sankhyd, 3, 249. 

Boss, S. S., and Manaranosis, P. C. (19385). On estimating individual yields in the case of 
mixed-up yields of two or more plots in field experiments. Sankhyd, 4, 103. 

Bowtry, A. L. (1912). The measurement of the aceuracy of an average. J.R.S.S., 75, 77. 

Bowtey, A. L. (1919). The measurement of changes in the cost of living. J.R.S.S., 82, 343. 

Bowtey, A. L. (1920). Prices and Wages in the United Kingdom, Clarendon Press, Oxford. 

Bowery, A. L., and Smits, K. C. (1924). Seasonal variations in Finance, Prices and Industry. 
Lond. and Camb. Ke. Service, Special Memo. No. 7. 

Bowrgy, A. L. (1925). Measurement of the precision attained in sampling. Bull. Int. Inst. 
Stat., 22, let livre. 

Bow tery, A. L. (1926). The influence on the precision of index-numbers of the correlation between 
the prices of commodities. J.R.S.S., 89, 300. 

Bow ey, A. L. (1928). F. Y. Edgeworth’s Contributions to Mathematical Statistics. Royal Sta- 
tistical Society, London. 

Bowtey, A. L. (1933). The action of economic forces in producing frequency-distributions of 
income, prices and other phenomena. Econometrika, 1, 358. 


Bowtey, A. L. (1938). Note on Professor Frisch’s ‘The Problem of Index Numbers’. Econo- 
metrika, 6, 83. 


BIBLIOGRAPHY 449 


BraDiey, P. D., and Crum, W. L. (1939). Periodicity as an explanation of variation in hog 
production. Hconometrika, 7, 221. 

Brapy, J. (1935). A biological application of the analysis of covariance. Supp. J.R.S.S., 2, 99. 

Branver, F. A. (1933). A test of the significance of the difference of the correlation coefficients 
in normal samples. Biom., 25, 102. 

Branprt, A. E. (1933). The analysis of variance in a 2 x s table with disproportionate frequencies. 
J. Am. Stat. Ass., 28, 164. 

Bretot, M. (1936, 1937). Sur l’influence des erreurs de mesure en statistique. J. Math. Pur. 
App., 15, 113, and 16, 285; also Congrés Int. de Math., Oslo (1936). 

Breror, M. (1937). Quelques difficultés dans l’application pratique de la théorie des erreurs. 
Mathematica, 13, 243. 

Bropericg, P. S. (1937). On some symbolic formulae in probability theory. Proc. Roy. Irish 
Acad., A, 44, 19. 

Brocet, U. (1934). Su di uno speciale problema dei momenti. Ann. di Mat., (4), 12, 63. 

Brown, G. M. (1933). On sampling from compound populations. Ann. Math. Stats., 4, 288. 

Brown, G. W. (1939). On the power of the Z, test for equality of several variances. Ann. Math. 
Stats., 10, 119. 

Brown, G. W. (1940). Reduction of a certain class of statistical hypotheses. Ann. Math. Stats., 
11, 254. 

Brown, J. W., GREENWooD, M., and Woop, Frances (1914). A study of index correlations. 
Ps a Ty Olds 

Brown, W. (1909). Some experimental results in correlation. Proceedings Sixth Int. Congress 
Psychology, Geneva. 

Brown, W., and THomson, G. H. (1925). The essentials of mental measurement. Cambridge 
University Press. 

Brown, W. (1935). A note on the theory of two factors versus the sampling theory of mental 
ability. Brit. J. Psych., 25, 395. 

BRowntes, J. (1905). Statistical studies in immunity: small-pox and vaccination. Biom., 
4, 313. 

Brownteg, J. (1910). The significance of the correlation coefficient when applied to Mendelian 
distributions. Proc. Roy. Soc. Edin., 30, 473. 

BROWN LEE, J. (1911). The mathematical theory of random migration and epidemic distribution. 
Proc. Roy. Soc. Edin., 31, 262. 

BROWNLEE, J., and Morison, R. M. (1911). Notes on the calculation of the probabilities of life 
at high ages. J.R.S.S., 74, 201. 

BRown EE, J. (1918). Certain aspects of the theory of epidemiology in special reference to plague. 
Proc. Roy. Soc. Med., Sect. Epidem. and State Medicine, 10D, 85. 

BRoWNLEE, J. (1924a). Experiments to test the theory ot goodness of fit. J.R.S.S., 87, 76. 

BROWNLEE, J. (1924b). Test of periodogram analysis. J,.R.S.S., 87, 83. 

Brown es, J. (1925). Error in the correlation due to random sampling when proportionate 
mortalities are used. J.R.S.S., 88, 105. 

Bruen, C. (1938). Methods for the combination of observations, etc. Metron, 13, No. 2, 61. 

Bruns, H. (1906). Wahrscheinlichkeitsrechnung und Kollektivmasslehre. Teubner, Leipzig. 

Bruns, H. (1921). Uber die Analyse periodischer Vorginge. Astr. Nach., 188. 

Brunt, D. (1925). Periodicities in European weather. Phil. Trans., A, 225, 247. 

Brunt, D. (1928). Harmonic analysis and the interpretation of the results of periodogram investi- 
gations. Mem. R. Met. Soc., 2, No. 15, 47. 

Brunt, D. (1931). The Combination of Observations. Cambridge University Press. 

Bounak, V. V. (1936). Changes in the mean values of characters in mixed populations. Ann. 
Eug. Lond.; 7, 195. 

BuRKHARDT, F., and StackeiBere, H. V. (1939). Zur Ableitung der Sheppardschen Korrektur. 
Arch. Math. Wirtsch.- uw. Socialforschung, 5, 127. 


A.S.—VOL. I. GG 


450 BIBLIOGRAPHY 


Burgs, B. 8. (1933). A statistical method for estimating the distribution of sizes of completed 
fraternities in a population represented by a random sampling of individuals. J. Am. 
Stat. Ass., 28, 388. 

Burnsipr, W. (1924). On Bayes’ formula. Biom., 16, 189. 

Burnsipg, W. (1928). Theory of Probability. Cambridge University Press. 

Burr, I. W. (1942). Cumulative frequency functions. Ann. Math. Stats., 13, 215. 

Burra, C. (1934). Contribution to the problem of dissection of a given frequency curve. Nordic 
Stat. J., 5, 43. 

Burt, C. (1927). Mental and Scholastic Tests. P. 8. King, London. 

Burt, C. (1936). Marks of Examiners. Macmillan, London. 

Burt, C. (1937a). Correlations between persons. Brit. J. Psych., 28, 59. 

Burt, C. (1937b). Methods of factor analysis with and without successive approximations. Brit. 
J. Wauc.. Payehy 7, 172: 

Burt, C. (1938a). The unit hierarchy and its properties. Psychometrika, 3, 151. 

Burt, C. (19385). Factor analysis by sub-matrices. J. Psych., 6, 339. 

Buys-Bautuot, C. H. D. (1847). Les changements périodiques de température. Utrecht. 


Caccroprottit, R. (1932). Sull’ approssimazione per polinomi delle funzioni definiti in campi 
illimitati. Giorn. Ital. Ist. Att., 3, 364. 

Camp, B. H. (1922). Anew generalisation of Tchebycheff’s statistical inequality. Bull. Am. Math. 
Soc., 28, 427. 

Camp, B. H. (1924). Probability integrals for the point binomial. Biom., 16, 163. 

Camp, B. H. (1925a). Probability integrals for the hypergeometric series. Biom., 17, 61. 

Camp, B. H. (19256). Mutually consistent multiple regression surfaces. Biom., 17, 443. 

Camp, B. H. (1932). The converse of Spearman’s two-factor theorem. Biom., 24, 418. 

Camp, B. H. (1934). Spearman’s general factor again. Biom., 26, 260. 

Camp, B. H. (1937). Methods of obtaining probability distributions. Ann. Math. Stats., 8, 90. 

Camp, B. H. (19382). Notes on the distribution of the geometric mean. Ann. Math. Stats., 
9, 221. 

Camp, B. H. (19380). Further interpretations of the chi-square test. J. Am. Stat. Ass., 33, 537. 

CampBELL, N. (1935). The statistical theory of errors. Proc. Phys. Soc., 47, 800. 

CampBeELL, N. (1939). Frequency interpretations in probability. Nature, 143, 601. 

Cannon, E. W., and Winter, A. (1935). An asymptotic formula for a class of distribution 
functions. Proc. Edin. Math. Soc., 4, 138. 

CanTeLu, F. P. (1913). Sulla differenza media con ripetizione. Giorn. Econ. e Riv. di Stat., 
February. 

CaNTELLI, F. P. (1916). La tendenza ad un limite nel senso del calcolo della probabilita. Rend. 
Cire. Mat. di Palermo, 16, 191. 

CanTELL, F. P. (1917). Sulla probabilita come limite della frequenza. Rend. R. Acc. Linc. 
(5), 26, 39. 

CanTELLI, F. P. (1923). Sulla oscillazione delle frequenze intorno alla probabilité. Metron, 3, 
No. 2, 167. 

CanTELLI, F. P. (1929). Sulla legge di distribuzione dei redditi. Giorn. Econ. e Riv. di Stat. 

CaNTELLI, F. P. (1932). Una teoria astratta del calcolo della probabilita. Giorn. Ist. Ital. Aii., 
3, 257. 

CANTELLI, F. P. (1933a). Considerazione sulla legge uniforme dei grandi numeri e sulla general- 
ei di un fondamentale teorema del Sig. Paul Lévy. Giorn. Ist. Ital. Atte 

» o27, 


CanTELLI, F. P. (1933b). Sulla determinazione empirica delle legge di probabilita. Giorn. Ist. 
Ttal. Att., 4, 421. 


CaNnTELLI, F. P. (1935). Considérations sur la convergence dans le calcul lex probabilités. Ann 
Inst. H. Poincaré, 5, 1. 


BIBLIOGRAPHY 451 


CantTeLu, F. P. (1936). Considerazione su alcuni concette esposti nella introduzione della nota 
di R. de Mises. Gorn. Ist. Ital. Att., 7, 256. 

CaRLEMAN, T. (1925). Les fonctions quasi-analytiques. Gauthier-Villars, Paris. 

CaRison, J. L. (1932). A study of the distribution of means estimated from small samples by the 
method of maximum likelihood for Pearson’s Type II curve. Ann. Math. Stats., 3, 86. 

CARMICHAEL, F. L. (1931). Methods of computing seasonal indices. J. Am. Stat. Ass., 26, 135. 

CarsLaw, H. 8. (1930). Introduction to the Theory of Fourier’s Series and Integrals. Macmillan, 
London. 

Carver, H. C. (1932). Trapezoidal rule for computing seasonal indices. Ann. Math. Stats., 
3, 361. 

Carver, H. C. (1933). Note on the computation and modification of moments. Ann. Math. 
Stats., 4, 229. 

CarVER, H. C. (1936). The fundamental nature and proof of Sheppard’s adjustments. Ann. 
Math, Stats., 7, 154. 

CasTELLANO, V. (1933a). Sulle relazioni tra curve di frequenza e curve di concentrazione e sui 
rapporti di concentrazione corrispondenti a determinate distribuzioni. Metron, 10, 
No. 4, 3. 

CASTELLANO, V. (19336). Sulla interpretazione dinamica del rapporto di concentrazione. Gorn. 
Ist. Ital. Att., 4, 268. 

CASTELLANO, V. (1934). Sulla scarto quadratico medio della probabilita di transvariazione. 
Metron, 11, No. 4, 19. 

CaSTELLANO, V. (1935). Recente letteratura sugli indici di variabilita. Metron, 12, No. 3, 101. 

CASTELLANO, V. (1937). Sugli indici relativi di variabilitaé e sulla concentrazione dei caratteri 
con segno. Metron, 13, No. 1, 31. 

CASTELNUOVO, G. (1926-8). Calcolo della probabilita. Bologna. 

CasTELNUOVO, G. (1932). Sur quelques problémes se rattachant au calcul des probabilités. Ann. 
Inst. H. Poincaré, 3, 465. 

Cavz, B. M., and Parson, K. (1914). Numerical illustrations of the variate-difference corre- 
lation method. Biom., 10, 340. 

CavE-BrowneE-Cave, F. E. (1904). On the influence of the time factor on the correlation between 
the barometric heights at stations more than 1000 miles apart. Proc. Roy. Soc., A, 
74, 403. 

CHANDRA SxeKA4R, C., and Francis, M. G. (1941). A method to get the significance limit of a 
type of test criteria. Sankhyd, 5, 165. 

CHape.in, J. (1932). On a method of proceeding from partial cell-frequencies to ordinates and 
to total cell-frequencies in the case of a bivariate frequency surface. Biom., 24, 495. 

Cuapman, D. W. (1935). The generalised problem of correct matchings. Ann. Math. Stats., 
6, 85. 

CHapman, R. A. (1938). Applicability of the z-test to a Poisson distribution. Biom., 30, 188. 

CHaRLisER, C. V. L. (1906). Researches into the theory of probability. Medd. Lunds Astr. Obs. 

Cuaruier, C. V. L. (1912). Contributions to the mathematical theory of statistics. Medd. 
Lunds Astr. Obs. 

CHaruier, C. V. L. (1928). A new form of the frequency function. Medd. Lunds Astr. Obs., 
Series 2, No. 51. 

Cuarier, C. V. L. (1931). Applications [de la théorie des probabilités] a Vastronomie. (Part of 
the T'raité edited by Borel.) Gauthier-Villars, Paris. 

CursHire, L., Oupis, E., and Pearson, E. 8. (1932). Further experiments on the sampling 
distribution of the correlation coefficient. J. Am, Stat. Ass., 27, 121. 

Cuiopovsxy, L. (1938). Le probléme des moments et les polynomes de S. Bernstein. Comptes 
rendus Acad. Sci. U.S.S.R., 19, 659. 

Curistipis, B. G. (1931). The importance of the shape of plot in field experimentation. J. Agr. 
Sci., 21, 14. 


452 BIBLIOGRAPHY 


Cuurca, A. E. R. (1925). On the moments of the distributions of squared standard deviations 
for samples of NV drawn from an indefinitely large population. Biom., 17, 79. 
Cuurcna, A. E. R. (1926). On the means and squared standard deviations of small samples from 
any population. Biom., 18, 321. 

CrsBant, R. (1938). Contributi alla teoria delle medie. Metron, 13, No. 2, 23, and No. 3, 3. 

Crapuam, A. R. (1931). Studies in sampling technique: cereal experiments. J. Agr. Sct., 
21, 366 and 376. 

CrapHam, A. R. (1936). Over-dispersion in grassland communities and the use of statistical 
methods in plant ecology. J. Ecology, 24, 232. 

CuarEMont, ©. A. (1916). On the correlation between the ‘corrected’ cancer and diabetes 
death-rates. Biom., 11, 191. 

Ciark, A., and Leonarp, W. H. (1939). The analysis of variance with special reference to data 
expressed as percentages. J. Am. Soc. Agron., 31, 55. 

Cuorrer, C. J., and Pearson, E. 8. (1934). The use of confidence or fiducial limits illustrated in 
the case of the binomial. Biom., 26, 404. 

Cops, C. W. (1939). Note on Frisch’s diagonal regression. Hconometrika, 7, 77. 

Cocuran, W. G. (1934). The distribution of quadratic forms in a normal system, with applications 
to the analysis of covariance. Proc. Camb. Phil. Soc., 30, 178. 

Cocuran, W. G. (1935). A note on the influence of rainfall on the yield of cereals in relation to 
manurial treatment. J. Agr. Sci., 25, 510. 

Cocuran, W. G. (19362). The y?-distribution for the binomial and Poisson series with small 
expectations. Ann. Hug. Lond., 7, 207. 

CocHran, W. G. (19365). Statistical analysis of field counts of diseased plants. Supp. J.R.SS., 
3, 49. 

Cocuran, W. G. (1937a). The efficiencies of the binomial series tests of significance of a mean 
and correlation coefficient. J.R.S.S., 100, 69. 

Cocuran, W. G. (19376). Problems arising in the analysis of a series of similar experiments. 
Supp. J.R.SS., 4, 102. 

Cocuran, W. G. (1938a). The omission or addition of an independent variate in multiple linear 
regression, Supp. J.R.SS., 5, 171. 

Cocuran, W. G. (1938). Some difficulties in the statistical analysis of replicated experiments. 
Emp. J. Exp. Agr., 6, 157. 

Cocnran, W. G. (1939a). Long-term agricultural experiments. Supp. J.R.S.S., 6, 104. 

Cocuras, W. G. (19396). The use of the analysis of variance in enumeration by sampling. 
J. Am. Stat. Ass., 34, 492. 

Cocuran, W. G. (1940a). Note on an approximative formula for significance levels of z. Ann. 
Math. Stats., 11, 93. 

Cocuran, W. G. (19400). The analysis of variance when experimental errors follow the Poisson 
or binomial laws. Ann. Math. Stats., 11, 335. 

Cocuran, W. G. (1941). The distribution of the largest of a set of variances as a fraction of their 
total. Ann. Fug. Lond., 11, 47. 

Cocuran, W. G. (1942a). The x? correction for continuity. Jowa State College J. Sci., 61, 421. 

Cocuran, W. G. (19426). Sampling theory when the sampling units are of unequal sizes. J. Am. 
Stat. Ass., 37, 199. 

Cocuran, W. G. (1943). The comparison of different scales of measurement for experimental 
results. Ann. Math. Stats., 14, 205. 

Coteman, J. B. (1932). A coefficient of linear correlation based on the method of least squares 
and. the line of best fit. Ann. Math. Stats., 3, 79. 

Comer, L. J. (1936). Inverse interpolation and scientific applications of the National accounting 
machine. Supp. J.R.S.S., 3, 87. 

Comrig, L. J., Hey, G. B., and Hupson, H. G. (1937). Application of Hollerith equipment to 
an agricultural investigation. Supp. J.R.S.S., 4, 210. 


BIBLIOGRAPHY 453 


Comriz, L. J. (1939). Tables of tan~! x and log (1 + 2). Tracts for Computers, No. 23. Cam- 
bridge University Press. 

Comrie, L. J., and Hartiey, H. O. (1941). Tables of Lagrangian coefficients for harmonic inter- 
polation in certain tables of percentage points. Biom., 32, 183. 

Co-operative Study, see Soper, H. E. and others, 1917. 

Copetanp, A. H. (1928). Admissible numbers in the theory of probability. Am. J. Maths., 
50; 535, 

CopELaND, A. H. (1929). Independent event histories. Am. J. Maths., 51, 612. 

CopeLanp, A. H. (1932). The theory of probability from the point of view of admissible numbers. 
Ann. Math. Stats., 3, 143. 

CorprLann, A. H. (1936). Point set theory applied to the random selection of the digits of an 
admissible number. Am. J. Maths., 58, 181. 

CopEetanp, A. H., and Rraan, F. (1936). A postulational treatment of the Poisson law. Ann. 
Maith., 37, 357. 

CopELaNnD, A. H. (1937). Consistency of conditions determining collectives. Trans. Am. Math. 
Soc., 43, 333. 

Cornisu, E. A. (1936). Non-replicated factorial experiments. J. Aus. Inst. Agr. Sci., 2, 79. 

CornisH, HE. A., and Fisuer, R. A. (1937). Moments and cumulants in the specification of dis- 
tributions. Rev. Inst. Int. Stat., 5, 307. 

CornisH, E. A. (1940a, 6, c). The estimation of missing values in incomplete randomised block 
experiments. Ann. Hug. Lond., 10, 112; The estimation of missing values in quasi- 
factorial designs. Jbid., 10, 137; The analysis of covariance in quasi-factorial designs. 
Ibid., 10, 269. 

Cowtszs, A. (1933). Can stock-market forecasters forecast ? Hconometrika, 1, 309. 

Cow ss, A., and CHapmay, E. N. (1935). A statistical study of climate in relation to pulmonary 
tuberculosis. J. Am. Stat. Ass., 30, 517. 

Cowes, A., and Jongs, H. F. (1937). Some a posteriori probabilities in stock-market action. 
Econometrika, 5, 280. 

Cox, G. M., and SNEDEcoR, G. W. (1936). Covariance used to analyse the relation between corn- 
yield and average. J. Farm. Econ., 18, 597. 

Cox, G. M. (1940). Enumeration and construction of balanced incomplete block configurations. 
Ann. Math. Stats., 11, 72. 

Craic, A. T. (1932). The simultaneous distribution of mean and standard deviation in small 
samples. Ann. Math. Stats., 3, 126. 

Craic, A. T. (1933a). On the correlation between certain averages for small samples. Ann. 
Math. Stats., 4, 127. 

Craic, A. T. (19336). Variables correlated in sequence. Bull. Am. Math. Soc., 39, 129. 

Craic, A. T. (1936a), Note on a certain bilinear form that occurs in statistics. Am. J. Maths., 
58, 864. 

OCraia, A. T. (19366). A certain mean-value problem in statistics. Bull. Am. Math. Soc., 42, 670. 

Craic, A. T. (1938). On the independence of certain estimates of variance. Ann. Math. Stats., 
9, 48. 

Craic, A. T. (1939). On the mathematics of the representative method of sampling. Ann. 
Math. Stats., 10, 26. 

Crata, A. T. (1943). Note on the independence of certain quadratic forms. Ann. Math. Stats., 


14, 195. 

Crara, C. C. (1928). An application of Thiele’s seminvariants to the sampling problem. Metron, 
a. No, 4, 3: 

Craic, C. C. (1929a). Sampling when the parent population is of Pearson’s Type III. Biom., 
21, 287. 


Crata, C. C. (19290). The frequency function of y/x. Ann. Math., 30, 471. 
Craic, C. C. (193la). Sampling in the case of correlated observations. Ann. Math. Stats., 2, 324. 


454 BIBLIOGRAPHY 


Crarc, C. C. (19315). Note on the distribution of samples of N drawn from a Type A population. 
Ann. Math. Stats., 2, 99. ; 

Cratc, C. C. (193lc). Ona property of the seminvariants of Thiele. Ann. Math. Stats., 2, 154. 

Cratc, OC. C. (1932). On the composition of dependent elementary errors. Ann. Math., 33, 184. 

Crarc, C. C. (1933). On the Tchebycheff inequality of Bernstein. Ann. Math. Stats., 4, 94. 

Crarc, C. C. (19362). On the frequency function of zy. Ann. Math. Stats., 7, 1. 

Crarc, C. C. (19360). A new exposition and chart for the Pearson system of frequency curves. 
Ann. Math. Stats., 7, 16. 

Cratca, C. CO. (1936c). Sheppard’s corrections for a discrete variable. Ann. Math. Stats., 7, 55. 

Craic, ©. C. (1940). The product seminvariants of the mean and a central moment in samples. 
Ann. Math. Stats., 11, 177. 

Craic, C. C. (1941). Note on the distribution of non-central ¢ with an application. Ann. Math. 
Stats., 12, 224. 

Craic, C. C. (1941). A note on Sheppard’s corrections. Ann. Math. Stats., 12, 339. 

Craic, J. I. (1916). A new method of discovering periodicities. Month. Not. R. Astr. Soc., 76, 493. 

Cramér, H. (1923). Das Gesetz von Gauss und die Theorie des Risikos. Skand. Akt., 6, 209. 

Cramér, H. (1926). On some classes of series used in mathematical statistics. Skandinaviske 
Matematikercongres, Copenhagen. 

Cramér, H. (1928). On the composition of elementary errors. Skand. Akt., 11, 13 and 141. 

Cramér, H. (1934). Su un teorema relativo alla legge uniforme dei grandi numeri. Gorn. Ist. 
Tial wits, 5, 1. 

CramérR, H. (1935a). Sur les propriétés asymptotiques d’une classe de variables aléatoires. 
Comptes rendus, 201, 441. 

Cramér, H. (1935). Sugli sviluppi asintotici di funzioni di repartizione in serie di polinomi di 
Hermite. Giorn. Ist. Ital. Att., 6, 141. 

Cramér, H. (1936). Uber eine Eigenschaft der normalen Verteilungsfunktion. Math. Zeit., 41,405. 

Cramoér, H., and Wotp, H. (1936). Some theorems on distribution functions. J. Lond. Math. 
Soc., 11, 290. 
Cramér, H. (1937). Random variables and probability distributions. Cambridge University Press. 
Cramér, H. (1938-9). Entwicklungslinien der Wahrscheinlichkeitsrechnung. 9° Congrés des 
Math. Scand., 67. 
Cramér, H., Livy, P., and von Miszs, R. (1938). Les sommes et les fonctions de variables 
aléatoires. Conf. Int. de Sct. Math., 3. 

CrowTHER, G. (1934). The ‘ Economist ’ index of business activity. J.R.S.S., 97, 241. 

Crum, W. L. (1923). Cycles of rates on commercial paper. Rev. Econ, Stats., 5, 17. 

Crum, W. L. (1925). Progressive variation in seasonality. J. Am. Stat. Ass., 20, 48. 

Crum, W. L. (1933). An analytical interpretation of straw vote samples. J. Am. Stat. Ass., 
28, 152. 

Cureton, E. E., and Dunuap, J. W. (1938). Developments in statistical methods related to test 
construction. Rev. Hduc. Res., 8, 307. 

Curtiss, J. H. (1941). On the distribution of the quotient of two chance variables. Ann. Math. 
Stats., 12, 409. 

Curtiss, J. H. (1943). On transformations used in the analysis of variance. Ann. Math. Stats., 
14, 107. 

CzuBER, E. (1921). Die statistische Forschungsmethode. Seidel, Wien. 

CzuBEr, EK. (1921, 1923). Wahrscheinlichkeitsrechnung und ihre Anwendung auf Fehlerausgleichung, 
Statistik und Lebensversicherung. Teubner, Leipzig. 


Daty, J. F. (1940). On the unbiassed character of likelihood ratio tests for independence in normal 
systems. Ann. Math. Stats., 11, 1. 

Dantes, H. E. (1938a). The effect of departures from ideal conditions other than non-normality 
on the ¢- and z-tests of significance. Proc. Camb. Phil. Soc., 34, 321. 


BIBLIOGRAPHY 455 


Dantets, H. E. (19386). Some problems of statistical interest in wool research. Supp. J.R.S.S.; 
Bi tees 

Dantgzs, H. E. (1941). A property of the distribution of extremes. Biom., 32, 194. 

Daniezs, H. E. (1944). The relation between measures of correlation in the universe of sample 
permutations. Biom., 33, 129. 

Dantzic, G. B. (1939). Ona class of distributions that approach the normal distribution function. 
Ann. Math. Stats., 10, 247. 

Dantzic, G. B. (1940). On the non-existence of tests of ‘Student’s’ hypothesis having power 
functions independent of o. Ann. Math. Stats., 11, 186. 

Darnmots, G. (1928). Statistique mathématique. Octave Doin, Paris. 

Darmois, G. (1929). Analyse et comparaison des séries statistiques qui se développent dans 
le temps. Metron, 8, Nos. 1-2, 211. 

Darmorts, G. (1933). Distributions statistiques rattachées 4 la loi de Gauss et la répartition des 
revenus. Hconometrika, 1, 159. 

Darnois, G. (1934). Sur la théorie des deux facteurs de Spearman. Comptes rendus, 199, 1176 
and 1358. 

Darmots, G. (1935). Sur les lois de probabilité & estimation exhaustive. Comptes rendus, 200, 
1265. 

Darmots, G. (1936). L’emploi dzs observations statistiques. Méthodes d’estimation. Actualités 
scientifiques et industrielles, No. 356. Paris. Hermann et Cie. 

Davin, F. N. (1934). On the P,, test for randomness ; remarks, further illustration and table 
for P,,, Biom., 26, 1. 

Davin, F. N. (1937). A note on unbiassed limits for the correlation coefficient. Biom., 29, 157. 

Davin, F. N. (1938a). Tables of the Correlation Coefficient. Cambridge University Press. 

Davin, F. N. (1938)). Limiting distributions connected with certain methods of sampling human 
populations. Stat. Res. Mem., 2, 69. 

Davin, F. N., and Neyman, J. (1938c). Extension of the Markoff theorem on least squares. 
Stat. Res. Mem., 2, 105. 

Davin, F. N. (1939). On Neyman’s ‘smooth’ test for goodness of fit. I. Distribution of the 
criterion y? when the hypothesis tested is true. Biom., 31, 191. 

Davies, G. R. (1930). First moment correlation. J. Am. Stat. Ass., 25, 413. 

Daviss, O. L. (1932). On the betas of quadrilateral distributions. Biom., 24, 498. 

Davizs, O. L. (1933, 1934). On asymptotic formulae for the hypergeometric series. I. Hyper- 
geometric series in which the fourth element is unity. Biom., 25, 295; II. Ibid., 
26, 59. 

Davigs, O. L., and Prarson, E. S. (1934). Methods of estimating from samples the population 
standard deviation. Supp. J.R.S.S., 1, 76. 

Davis, H. T. (editor) (1933, 1934). Tables of the Higher Mathematical Functions. Parts I and II. 
Bloomington Press, Indiana. 

Davis, H. T. (1933). Polynomial approximation by the method of least squares. Ann. Math. 
Stats., 4, 155. 

Davis, H. T. (1941). The Analysis of Economic Time Series. Bloomington Press, Indiana. 

Day, B., and Fisuer, R. A. (1937). The comparison of variability in populations having unequal 
means. An example of the analysis of covariance with multiple dependent and inde- 
pendent variates. Ann. Hug. Lond., 7, 333. 

DE Fryer, B. (1929). Sulle funzioni a incremento aleatorio. Rend. R. Acc. Linc., (6) 10, 163. 

DE Frvertt, B. (1930a). Le funzioni caratteristiche di legge istantanea. Rend. R. Acc. Linc., 
6) 12, 278. 

DE aM B., and PacrEtxo, U. (19300). Calcolo della differenza media. Metron,8, No. 3, 89. 

DE Fiver, B. (1931). Sui metodi proposti per il calcolo della differenza media. Metron, 9, 
No. 1, 3. 

pE Frnertt, B. (1932). Sulla legge di probabilita degli estremi. Metron, 9, Nos. 3-4, 127. 


456 BIBLIOGRAPHY 


DE Frvertt, B. (1933a). Classi di numeri aleatori equivalenti. Rend. R. Acc. Linc., (6) 18, 107; 
La legge dei grandi numeri nel caso dei numeri aleatori equivalenti. Jbid., 18, 203 ; and 
Sulla legge di distribuzione dei valori in una successione di numeri aleatori equivalenti. 
Ibid., 18, 279. 

DE Frvetti, B. (1933). Sull’ approssimazione empirica di una legge di probabilita. Giorn. Ist. 
Ttal. Aitt., 4, 415. 

pE Frverti, B. (1937). La prévision: ses logiques, ses sources subjectives. Ann. Inst. H. 
Poincaré, 7, 1. 

DE Fiverti, B. (1939a). Resoconto critico del colloquio di Ginevra intorno alla teoria delle 
probabilité. Giorn. Ist. Ital. Att., 9, 1. 

DE Frvertt, B. (1939b). La teoria del rischio e il problema della rovina dei giocatori. Giorn. 
Ist. Ital. Att., 10, 41. : 

DEL Curaro, A. (1936). Sui momenti delle leggi di distribuzione del Polya a pit variabili. Giorn. 
Leth, Mis “bts Wo Wal, 

DELL’ AGNoLa, C. A. (1937). Sulla tendenza ad una variabile casuale limite di una successione 
di variabili casuali punteggiate discontinue. Aft. Ist. Veneto Sci., 96, 365. 

Dr Lury, D. (1938). Note on correlations. Ann. Math. Stats., 9, 149. 

DEL Veccuio, E. (1933). Sulla dipendenza statistica. Giorn. Ist. [tal. Att., 4, 235. 

Demine, W. E. (1931, 1934, 1935). On the application of least squares. I. Phil. Mag., (7), 
11, 146; II. Jbid., 17, 804; IIL. Lbid., 19, 389. 

Demine, W. E. (1934, 1938). The chi-test and curve fitting. J. Am. Stat. Ass., 29, 372; and: 
Some thoughts on curve fitting and the chi-square test. Jbid., 33, 543. 

Demine, W. E., and Biras, R. T. (1934). On the statistical theory of errors. Rev. Mod. Phys., 
6, No. 3, 122. 

Demine, W. E. (1937). On the significant figures of least squares and correlations. Science, 
85, 451. 

Demotvere, A. (1718). The Doctrine of Chances. (3rd edition, 1756.) 

Denk, F. (1936). Uber den Aufbau der Permutation geordneter Elemente. J. fiir Math., 
176, 18. 

Derx«son, J. B. D. (1939). On some infinite series introduced by Tschuprow. Ann. Math. Stats., 
10, 380. 

Detroit Epison Co. SratisticAL DEPARTMENT (1930). A mathematical theory of seasonal 
indices. Ann. Math. Stats., 1, 57. 

DE Vercortini, M. (1936). Relazioni fra gli indict di variabilita dei fenomeni collettivi composti 
e quelli det fenomeni collettivi semplici. Failli, Rome. 

Drsuerait, C. E. (1934a). Contribution a l'étude de la théorie de la correlation. Biom., 26, 379. 

Dizvuerait, C. E. (1934b). Sur les développements des fonctions des fréquences en séries de 
fonctions orthogonales. Metron, 11, No. 4, 77. 

Disevwerait, C. E. (1935a). Sur la corrélation au sens des modes. Comptes rendus, 200, 1511. 

Dievuerait, C. E. (1935b). Géneéralisation des courbes de K. Pearson. Metron, 12, No. 2, 95. 

Dixon, W. J. (1940). A criterion for testing the hypothesis that two samples are from the same 
population. Ann. Math. Stats., 11, 199. 

Drxon, W. J. (1944). Further contributions to the problem of serial correlation. Ann. Math. 
Stats., 15, 119. 

Dopp, E. L. (1923). The greatest and the least variate under general laws of error. Trans. 
Am. Math. Soc., 25, 525. 

Dopp, E. L. (1926). The convergence of a general mean of measurements to the true value. 
Bull. Am. Math. Soc., 32, 282. 

Dopp, E. L. (1927). The convergence of general means and the invariance of form of certain 
frequency functions. Am. J. Maths., 49, 215. 

Dopp, E. L. (1930). The use of linear functions to detect hidden periods in data separated into 
small sets. Ann. Math. Stats., 1, 205. 


BIBLIOGRAPHY 457 


Dopp, E. L. (1931). Classification of sizes and measures by frequency functions. J. Am. Stat. 
JG, PAD, PATE 

Dopp, E. L. (1934). The complete independence of certain properties of means. Ann. Math., 
35, 740. 

Dopp, E. L. (1937a). Internal and external means arising from the scaling of frequency functions. 
Ann. Math. Stats., 8, 12. 

Dopp, E. L. (19375). Regression coefficients as means of certain ratios. Am. Math. Monthly, 
44, 306. 

Dopp, E. L. (1937c). Index numbers and regression coefficients as means, internal and external. 
Rep. Third Ann. Res. Conf. Econ. Stat., Colorado Springs, 13. 

Dopp, E. L. (1938). Interior and exterior means obtained by the method of moments. Ann. 
Math. Stats., 9, 153. 

Dopp, E. L. (1939a). The length of the cycles which result from the graduation of chance elements. 
Ann. Math. Stats., 10, 254. 

Dopp, E. L. (19396). Periodogram analysis with the phase a chance variable. Hconometrika, 
7, om 

Dopp, E. L. (194la). The problem of assigning a length to the cycle to be found in a simple 
moving average and in a double moving average of chance data. Hconometrika, 
ty, Dee 

Dopp, E. L. (19410). The cyclic effects of linear graduations persisting in the differences of the 
graduated values. Ann. Math. Stats., 12, 127. 

Donn, E, L. (1942). Certain tests for randomness applied to data grouped into small sets. Econo- 
metrika, 10, 249. 

Dopp, 8. C. (1927). On criteria for factorising correlated variables. Biom., 19, 45. 

Doersiiy, W. (1936, 1937). Sur les chaines discrétes de Markoff. Comptes rendus, 203, 24 and 
1210; and: Eléments d’une théorie générale des chaines constantes simples de Markoff. 
Ibid., 205, 7. 

Dorsuin, W. (1937). Sur le cas continu des probabilités en chaine. Rend. R. Acc. Linc., 25, 
170; Le cas discontinu des probabilités en chaine. Pub. Fac. Sci. Univ. Masaryk, 
No. 236, 3; and (with R. Fortet): Sur des chaines a liaisons complétes. Bull. Soc. 
Math. France, 65, 132. 

Doxrsiin, W. (1938). Premiers éléments d’une étude systématique de l’ensemble de puissances 
d’une loi de probabilité. Comptes rendus, 206, 306; and: Etude de l’ensemble de 
puissances d’une loi de probabilité. Jbid., 206, 718. 

Doersiin, W. (1938, 1939). Sur les sommes d’un grand nombre de vecteurs aléatoires. Comptes 
rendus, 207, 511; Sur certains mouvements aléatoires. Jbid., 208, 249 ; Sur les sommes 
d’un grand nombre de variables aléatoires indépendantes. Bull. Sci. Math., (2), 63, 
23 and 35. 

Doztscn, G. (1934). Die in der Statistik seltener Ereignisse auftretenden Charlierschen Polynome 
und eine damit zusammenhangende Differenzialdifferenzgleichung. Math. Ann., 109, 257. 

Donner, O. (1928). Die Saisonschwankungen als Problem der Konjunkturforschung. Vierteljahr- 
heften zur Konjunkturforschung, Sonderheft 6. Hobbing, Berlin. 

Doos, J. L. (1934a). Stochastic processes and statistics. Proc. Nat. Acad. Sci., 20, 376. 

Doos, J. L. (19346). Probability and statistics. Trans. Am. Math. Soc., 36, 759. 

Doos, J. L. (1935). The limiting distributions of certain statistics. Ann. Math. Stats., 6, 160. 

Doos, J. L. (1936). Statistical estimation. Z'rans. Am. Math. Soc., 39, 410. 

Doos, J. L. (1937). Stochastic processes depending on a continuous parameter. T'rans. Am. 
Math. Soc., 42, 107. 

Doon, J. L. (1938). Stochastic processes with an integral-valued parameter. Trans. Am. Math. 
Soc., 44, 87. 

Doos, J. L. (1941). Probability as measure. Ann. Math. Stats., 12, 206 (followed by discussion, 
Doos and von Miszs, 12, 215). 


458 BIBLIOGRAPHY 


Doopson, A. T. (1917). Relation of the Mode, Median and Mean in frequency curves. Biom., 
11, 429. . 

Déres, K. (1934). Eine Axiomatisierung der von Misesschen Wahrscheinlichkeitstheorie. Jber. 
disch. Mat. Ver., 43, 39. 

Doércex, K. (1936). Zu der von R. v. Mises gegebenen Begriindung der Wahrscheinlichkeitsrech- 
nung. Zweite Mitteilung. Allgemeine Wahrscheinlichkeitstheorie. Math. Zeit., 40, 
161. 

Dresset, P. L. (1940). Statistical seminvariants and their estimates, with particular emphasis 
on their relation to algebraic invariants. Ann. Math. Stats., 11, 33. 

DressEt, P. L. (1941). A symmetric method for obtaining unbiased estimates and expected 
values. Ann. Math. Stats., 12, 84. 

Dustin, L. I., Lotxa; A. J., and Sermcetman, M. (1935). The construction of life tables by 
correlation. Metron, 12, No. 2, 121. 

Dusors, P. (1939). Formulas and tables for rank correlation. Psych. Rec., 3, 46. 

Dvusorpisv, J. (1939). Théorie de Vassurance-maladie. Paris. 

Dueus, D. (19362). Sur le maximum de précision des estimations gaussiennes 4 la limite. Comptes 
rendus, 202, 193; and: Sur le maximum de précision des lois limites d’estimation. 
Ibid., 202, 452. 

Duevut, D. (1936b). Sur certaines modes de convergence de lois d’estimation. Comptes rendus, 
202, 1732. 
Dvevs, D. (1937a). Sur une extension de la loi des grands nombres. Comptes rendus, 204, 317. 
Duaut, D. (19376). Application des propriétés de la limite au sens du calcul des probabilités 
& l’étude des diverses questions d’estimation. J. Ecole Poly., 3, No. 4, 305. 
Duceut, D. (1939). Sur quelques propriétés analytiques des fonctions caractéristiques. Comptes 
rendus, 208, 1778. 

Donuap, H. F, (1931). An empirical determination of the distribution of means, standard devia- 
tions and correlation coefficients drawn from rectangular populations. Ann. Math. 
Stats., 2, 66. 

Dwyer, P. S. (19372). Moments of any rational integral isobaric sample moment function, 
Ann. Math. Stats., 8, 21. 

Dwyer, P. S. (19376). The simultaneous computation of groups of regression equations and 
associated multiple regression coefficients. Ann. Math. Stats., 8, 224. 

Dwyer, P. S. (1938). Combined expansions of products of symmetric power sums and of sums 
of symmetric power products with application to sampling. Ann. Math. Stats., 
9, 1 and 97. 

Dwyer, P. S. (1940). Combinatorial formulas for the rth standard moment of the sample sum, 
of the sample mean and of the normal curve. Ann. Math. Stats., 11, 353. 

Dwyer, P. 8. (194la). The solution of simultaneous equations. Psychometrika, 6, 101. 

Dwyer, P. S. (19415). The Doolittle technique. Ann. Math. Stats., 12, 449. 

Dwyer, P. 8. (1941c). The skewness of the residuals in linear regression theory. Ann. Math. 
Stats., 12, 104. 

Dwyer, P. 8. (1942). Recent developments in correlation technique. J. Am. Stat. Ass., 37, 441. 


Even, T., and Yarss, F. (1933). On the validity of Fisher’s z-test when applied to an actual 
example of non-normal data. J. Agr. Sci., 23, 6. 

Epeert, G. L. (1931). Frequency distributions with given statistics which are not all moments. 
Metron, 9, No. 2, 25. S 

Eperworts, F. Y., generally ; see BowLEy (1928). 

Epcrworts, F. Y. (1905). The Law of Error. Trans. Camb. Phil. Soc., 20, 36 and 113 (with 
an Appendix not printed in the 7.C.P.8. but issued with reprints). 

mms F. Y. (1906). The generalised law of error, or law of great numbers. J iSes., 

, 497. 


BIBLIOGRAPHY 459 


Eperwortu, F. Y. (1908, 1909). On the probable errors of frequency constants. J.R.S.S., 
71, 381, 499, 651, and 72, 81. 

Epceworts, F. Y. (1925a). Article ‘Index numbers’ in Palgrave’s Dictionary of Political 
Economy, vol. 2, Macmillan. 

EpGEWworTH, F. Y. (1925b). The plurality of index-numbers. LHcon. J., 35, 379. 

EpeewortH, F. Y. (1925c). The element of probability in index numbers. J.R.S.S., 88, 557. 

Eetzs, W. C. (1929). Formulas for probable errors of coefficients of correlation. J. Am. Stat. 
Ass., 24, 170. 

EGGENBERGER, F. (1924). Die Wahrscheinlichkeitsansteckung. Mitt. Verein. Schweiz. Versich. 
Math., Heft 19, 31. 

Ersenuart, C. (1938). The power function of the y?-test. Bull. Am. Math. Soc., 44, 32. 

EIsENHART, C. (1939). The interpretation of certain regression methods and their use in biological 
and industrial research. Ann. Math. Stats., 10, 162. 

Exvperton, E. M. (1933). The Lanarkshire Milk Experiment. Ann. Hug. Lond., 5, 326. 

Exvperton, W. P. (1933). Adjustments for the moments of J-shaped curves. Biom., 25, 179. 

Experton, W. P., and Hansmann, G. H. (1934). Improvement of curves fitted by the method 
of moments. J.R.S.S., 97, 330. 

ELpERTON, Sir W. P. (1938a). Frequency Curves and Correlation, 3rd edn. Cambridge Univer- 
sity Press. 

ELDERTON, Sir W. P. (19385). Correzioni dei momenti quando la curva 6 simmetrica. Giorn. 
Ist. Ital. Até., 16, 145. 

Ervine, G. (1937, 1938). Zur Theorie der Markoffschen Ketten. Acta. Soc. Sci. Fennicae, 
2, 1; and: Uber die Interpretation von Markoffschen Ketten. Soc. Sci. Fennicae 
Comment. phys.-nat., 10, No. 3, 1. 

Ex Saanawany, M. R. (1936). An illustration of the accuracy of the y?-approximation. Biom., 
28, 179. 

Emmett, W. G. (1936). Sampling error and the two-factor theory. Brit. J. Psych., 26, 362. 

ENnGELHART, M. D. (1936). The technique of path coefficients. Psychometrika, 1, 287. 

Erpétyt, A. (1937). Sulle connessioni fra due problemi di calcolo delle probabilitaé. Giorn. 
Ist. Ital. Ati., 8, 328. 

Erpéiyi, A. (1938). Uber eine erzeugende Funktion von Produkten Hermitescher Polynome. 
Math. Zeit., 44, 201. 

Erpés, P., and Turan, P. (1937, 1938). On Interpolation. I. Quadrature and mean-conver- 
gence in the Lagrange interpolation. Ann. Math., 38, 142; and II. On the distribution 
of fundamental pomts of Lagrange and Hermite interpolation. Jbid., 39, 703. 

Erpés, P., and Kac, M. (1939). On the Gaussian law of errors in the theory of additive functions. 
Proc. Nat. Acad. Sci., 25, 206. 

Erpos, P. (1939). On the smoothness of the asymptotic distribution of additive arithmetical 
functions. Am. J. Math., 61, 722. 

Erpoés, P., and Wintner, A. (1939). Additive arithmetical functions and statistical independence. 
Am. J. Maths., 61, 713. 

Esscuer, F. (1932). On the probability function in the collective theory of risk. Skand. Akt., 
15, 175. 

Evuumr, L. (1782). Recherches sur une nouvelle espéce de quarrés magiques. Verh. v. h. Zeewwsch 
Genootsch. der Wetensch. Vlissingen, 85. 

Eyravup, H. (1938a). Sur quelques lois d’erreurs 4 deux dimensions. Comptes rendus, 206, 402. 

Eyraup, H. (1938). Sur certaines décompositions en aléatoires imaginaires. Comptes rendus, 
206, 723. 

Eysrenck, H. J. (1939). The validity of judgments as a function of the number of judges. 
J. Exp. Psych., 25, 650. 

Ezexigt, M. (1930a). Methods of Correlation Analysis. John Wiley and Sons, New York. 
(Chapman and Hall, London.) 


460 BIBLIOGRAPHY 


Ezexret, M. (1930). The sampling variability of linear and curvilinear regression. Ann. Math. 
Stats., 1, 275. 3 


Fauxner, H. D. (1924). On the measurement of seasonal variations. J. Am. Stat. Ass., 19, 167. 

Farr, W. (1919, 1920). Farr’s law of density in relation to death rates. J.R.S.S., 82, 45, and 
83, 280. 

Frouner, G. T. (1897). Kollektivmasslehre. Engelmann, Leipzig. 

Fetp, W. (1924). Internationale Bibliographie der Statistik der Kindersterblichkeit. Metron, 
3, Nos. 3-4, 604. 

Fetpuem, FE. (1936a). Sur lorthogonalité des fonctions fondamentales de l’interpolation de 
Lagrange. Comptes rendus, 203, 650. 

Freipuem, EK. (19365). Sur les probabilités en chaine. Math. Ann., 112, 775. 

Feipuerm, E. (19372). Sulle legge di probabilité stabili a due variabili. Giorn. Ist. Ital. Att., 
8, 146. 

Fe.pyHem, E. (19376). Applicazioni dei polinomidi Hermite a qualche problema di calcolo delle 
probabilita. Giorn. Ist. Ital. Ait., 8, 303. 

FELpMAN, H. M. (1935). Mathematical expectation of product-moments of samples drawn from 
a set of infinite populations. Ann. Math. Stats., 6, 30. 

FELLER, W. (1936a). Zur Theorie der stochastischer Prozesse. Math. Ann., 113, 113. 

Feuer, W. (1936), 1937). Uber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung. 
Math. Zeit., 40, 521, and 42, 301. 

Feuier, W. (1937). Uber das Gesetz der grossen Zahlen. Acta Litt. Sci. Szeged, 8, 191. 

Freier, W. (1938). Note on regions similar to the sample space. Stat. Res. Mem., 2, 117. 

FEuuer, W. (1943). On a general class of ‘ contagious ’ distributions. Ann. Math. Stats., 14, 389. 

Fertic, J. W. (1936). On a method of testing the hypothesis that an observed sample of 
m variables and of size N has been drawn from a specified population of the same 
number of variables. Ann. Math. Stats., 7, 113. 

Fertic, J. W., and Prozan, E. A. (1937). A test of a sample variance based on both tail ends 
of the distribution. Ann. Math. Stats., 8, 193. 

Freier, E. C. (193la). The duration of play. Biom., 22, 377. 

Frecier, E. C. (19316). A problem in probability. Biom., 22, 425. 

Freier, E. C. (1931c). The game of heads and tails. Biom., 23, 419. 

Frevier, E. C. (19322). Numerical test of the adequacy of A. T. McKay’s approximation. 
JohS.S., 95, 699. 

FIELLER, E. C. (19326). The distribution of an index in a normal bivariate population. Biom., 
24, 428. 

Fievtrer, EH. C. (1940). The biological standardisation of insulin. Supp. J.R.S.S., 7, 1. 

Fiyney, D. J. (1938). The distribution of the ratio of estimates of the two variances in a sample 
from a normal bivariate population. Biom., 30, 190. 

Fryney, D. J. (1940, 1941, 1942). The detection of linkage. Ann. Fug. Lond., 10,171; 11, 10; 
11, 115 12,3) 

Finney, D. J. (1941a). The joint distribution of variance ratios based on a common-error mean 
square. Ann. Hug. Lond., 11, 136. 

Finvey, D. J. (19416). On the distribution of a variate whose logarithm is normally distributed. 

: Supp..J RS.S8., 7; 55. 

Fiscuer, C. H. (19332). On correlation surfaces of sums with a certain number of random elements 
in common. Ann. Math. Stats., 4, 103. 

Fiscuir, ©. H. (19336). On multiple and partial correlation coefficients of a certain sequence of 
sums. Ann. Math. Stats., 4, 278. 

FisHER, A. (1922). The Mathematical Theory of Probabilities and its application to Frequency- 
curves and Statistical Methods. 2nd edn. Macmillan, New York. 

FisHer, Irvine (1922). The Making of Index Numbers. . Houghton Mifflin, Boston and New York. 


BIBLIOGRAPHY 461 


Fisoer, R. A. (1912). On an absolute criterion for fitting frequency curves. Mess. Maths., 
41, 155. 

Fisuer, R. A. (1915). Frequency-distribution of the values of the correlation coefficient in samples 
from an indefinitely large population. Biom., 10, 507. 

FisHer, R. A. (1918). The correlation between relatives on the supposition of Mendelian inheri- 
tance. Trans. Roy. Soc. Edin., 52, 399. 

Fisuer, R. A. (1920). A mathematical examination of the methods of determining the accuracy 
of an observation by the mean error and by the mean square error. Month. Not. R. 
Astr. Soc., 80, 758. 

Fisuer, R. A. (192la). On the mathematical foundations of theoretical statistics. Phil. Trans. 
Roy. Soc., A, 222, 309. 

FisHer, R. A. (19216). Studies in crop-variation. I. An examination of the yield of dressed 
grain from Broadbalk. J. Agr. Sct., 11, 107. 

Fisuer, R. A. (1921c). On the probable error of a coefficient of correlation deduced from a small 
sample. Metron, 1, No. 4, 1. 

Fisnpr, R. A. (1922a). On the interpretation of y? from contingency tables and the calculation 
Obed pet 0.005 Ot. 

Fisumr, R. A. (19226). The goodness of fit of regression formulae and the distribution of regression 
coefiicients. J.R.S.S., 85, 597. 

Fisuser, R. A., THornton, H. G., and Mackenziz, W. A. (1922c). The accuracy of the plating 
method of estimating the density of bacterial populations. Ann. App. Biol., 9, 325. 

Fisuer, R. A. (1923). Statistical tests of agreement between observation and hypothesis. 
Economica, 3, 139. 

Fisomr, R. A. (19240). The distribution of the partial correlation coefficient. Metron, 3, 329. 

Fisuer, R. A. (19246). The influence of rainfall on the yield of wheat at Rothamsted. Phil. 
Trans. Roy. Soc., B, 213, 89. 

Fisuer, R. A. (1924c). On a distribution yielding the error functions of several well-known 
statistics. Proc. Int. Math. Congress, Toronto, p. 805. 

Fisoer, R. A. (1924d). The conditions under which y? measures the discrepancy between obser- 
vation and hypothesis. J.R.S.S., 87, 442. 

Fisuer, R. A. (1925a, 1944). Statistical Methods for Research Workers. (1st edn. 1925. 9th 
edn. 1944). Oliver and Boyd, Edinburgh. 

FisHer, R. A. (1925). Theory of statistical estimation. Proc. Camb. Phil. Soc., 22, 700. 
Fisuer, R. A. (1926a). Applications of ‘ Student’s’ distribution. Metron, 5, No. 3, 90; and: 
Expansion of ‘ Student’s ’ integral in powers of n-!. Metron, 5, No. 3, 109. 

Fisner, R. A. (1926). On the random sequence. Q.J. Roy. Met. Soc., 52, 250. 

FisHer, R. A. (1926c). Bayes’ theorem and the fourfold table. Hugenics Review, 18, 32. 

Fisuer, R. A., and WisHart, J. (1927). On the distribution of the error of an interpolated value 
and on the construction of tables. Proc. Camb. Phil. Soc., 23, 912. 

Fisuer, R. A., and Trepert, L. H. C. (1928a). Limiting forms of the frequency-distribution of 
the largest or smallest member of a sample. Proc. Camb. Phil. Soc., 24, 180. 

Fisner, R. A. (1928b). The general sampling distribution of the multiple correlation coefficient. 
Proc. Roy. Soc., A, 121, 654. 

Fisner, R. A. (1928c). On a property connecting the y* measure of discrepancy with the method 
of maximum likelihood. Atti di Congresso Int. der Matematici, Bologna, 6, 94. 

Fisuzer, R. A. (19292). Tests of significance in harmonic analysis. Proc. Roy. Soc., A, 
125, 54. 

Fisoer, R. A. (19290). Moments and product-moments of sampling distributions. Proc. Lond. 
Math. Soc., (2), 30, 199. 

Fisnmr, R. A. (19302). Inverse Probability. Proc. Camb. Phil. Soc., 26, 528. 

Fisuer, R. A. (19300). The moments of the distribution for normal samples of measures of 
departure from normality. Proc. Roy. Soc., A, 130, 16. 


462 BIBLIOGRAPHY 


Fisoer, R. A., and Wiswart, J. (1931). The derivation of the pattern formulae of two-way 
partitions from those of simpler patterns. Proc. Lond. Math. Soc., 33, 195. 

Fisuer, R. A. (1932). Inverse probability and the use of likelihood. - Proc. Camb. Phil. Soc., 
28, 257. 

Fisuer, R. A. (1933). The concepts of inverse probability and of fiducial probability referring 
to unknown parameters. Proc. Roy. Soc., A, 139, 343. 

Fisuer, R. A. (19342). Two new properties of mathematical likelihood. Proc, Roy. Soc., A, 
144, 285. 

Fisuer, R. A. (19346). Probability, likelihood and quantity of information in the logic of uncertain 
inference. Proc. Roy. Soc., A, 146, 1. 

Fisuer, R. A., and Yatss, F. (1934c). The 6 x 6 Latin square. Proc. Camb. Phil. Soc., 30, 492. 

Fisuer, R. A. (1934d). The effect of methods of ascertainment upon the estimation of frequencies. 
Ann. Hug. Lond., 6, 13. 

FisHer, R. A. (1935a). The logic of inductive inference. J.R.S.S., 98, 39. 

Fisuer, R. A. (19356). The fiducial argument in statistical inference. Ann. Hug. Lond., 6, 391. 

FisHer, R. A. (1935c, 1942). The Design of Experiments (1st edn. 1935, 3rd edn. 1942). Oliver 
and Boyd, Edinburgh. 

Fisuer, R. A. (1936a). The use of multiple measurements in taxonomic problems. Ann. Hug. 
Lond., 7, 179. 

Fisuer, R. A. (19360). The coefficient of racial likeness. J. Roy. Anthrop. Soc., 66, 57. 

Fisoer, R. A. (1936c). Uncertain inference. Proc. Roy. Soc., B, 122, 1. 

Fisuer, R. A. (1937a). Professor Karl Pearson and the method of moments. Ann. Hug. Lond., 
7, 303. 

Fisuer, R. A. (19376). On a point raised by M. S. Bartlett on fiducial probability. Ann. Fug. 
Lond., 7, 370. 

FisHer, R. A., and Yates, F. (1938a, 1942). Statistical Tables for use in Biological, Agricultural 
and Medical Research. 2nd edn. 1942. Oliver and Boyd, Edinburgh. 

Fisner, R. A. (19386). Quelques remarques sur l’estimation statistique. Biotypologie, 6, 153. 

Fisuer, R. A. (1938c). The statistical utilisation of multiple measurements. Ann. Eug. Lond., 
8, 376. 

Fisoer, R. A. (1938d). Statistical Theory of Estimation. Calcutta Readership Lectures. Pub- 
lished by the University of Calcutta. 

Fisuer, R. A. (1939a). The comparison of samples with possibly unequal variances. Ann. 
Eug. Lond., 9, 174. 

Fisuer, R. A. (19396). The sampling distribution of some statistics obtained from non-linear 
equations. Ann. Hug. Lond., 9, 238. 

Fisner, R. A. (1940a). On the similarity of the distributions found for the test of significance 
in harmonic analysis and in Stevens’s problem in geometrical probability. Ann. Hug. 
Lond., 10, 14. 

Fisuer, R. A. (19400). An examination of the different possible solutions of a problem in incom- 
plete blocks. Ann. Hug. Lond., 10, 52. 

Fisuer, R. A. (1940c). A note on fiducial inference. Ann. Math. Stats., 10, 383. 

Fisuer, R. A. (1940d). The precision of discriminant functions. Ann. Eug. Lond., 10, 422. 

Fiser, R. A. (1941la). The asymptotic approach to Behrens’ integral with further tables for 
the d-test of significance. Ann. Hug. Lond., 11, 141. 

FisHer, R. A. (19416). The negative binomial distribution. Ann. Bug. Lond., 11, 182. 

FIsHER, Pie (19420). New cyclic solutions to problems in incomplete blocks. Ann. Eug. Lond., 

, 290. 

Fisner, R. A. (1942). The likelihood solution of a problem in compounded probabilities. Ann. 
Eug. Lond., 11, 306. 

Fisuer, R. A. (1942c). The theory of confounding in factorial experiments in relation to the 
theory of groups. Ann. Hug. Lond., 11, 341. 


BIBLIOGRAPHY 463 


Fisner, R. A. (1942d). Some combinatorial theorems and enumerations connected with the 
numbers of diagonal types in a Latin square. Ann. Hug. Lond., 11, 395. 

Fisner, R. A. (1942e). Completely orthogonal 9 x 9 squares: a correction. Ann. Eug. Lond., 
11, 402. 

Frioux, A. W. (1921, 1933). The measurement of price changes. J. Roy. Stat. Soc., 84, 167, 
and 96, 606. 

Fortet, R. (1935-8). Sur les probabilités en chaine. Comptes rendus, 201, 184, 202, 1362, and 
204, 315; and: Sur litération des substitutions algébriques linéaires & une infinité de 
variables et ses applications 4 la théorie des probabilités en chaine. Rev. Ci., Lima, 
40, 185, 337, 481. 

FRANKEL, A., and Kuxypack, S. (1940). A simple sampling experiment on confidence intervals. 
Ann. Math. Stats., 11, 209. 

Franke, L. R., and Horertine, H. (1938). The transformation of statistics to simplify their 
distribution. Ann. Math. Stats., 9, 87. 

FRANKEL, L. R., and Stock, J. 8. (1939). The allocation of samplings among several strata. 
Ann. Math. Stats., 10, 288. 

Fricuet, M. (1930). Sur la convergence en probabilité. Metron, 8, No. 4, 3. 

Fricuet, M., and Sxouar, J. (1931). A proof of the generalised second-limit theorem. Trans. 
Am. Math. Soc., 33, 533. 

Fricuet, M. (1933). Sur le coefficient, dit de corrélation, et sur la corrélation en général. Rev. 
Inst. Int. Stat., 4, 1. 

Frécuet, M. (1935). Sur l’équation fonctionnelle de Chapman et sur le probléme des probabilités 
en chaine. Proc. Lond. Math. Soc., 39, 515. 

Fricnet, M. (1936a). Sull’ espressione esatta di uno scarto medio. Gorn. Ist. Ital. Att., 
6, 164. 

FrécuHet, M. (1936b). Sul caso positivamente regolare nel problema delle probabilita concatenate. 
Giorn. Ist. Ital. Att., 7, 28. 

Frécuet, M. (19372). Sulla mescolanza delle palline e sulle leggi-limite delle probabilita. Giorn. 
Ist. Ital. Att., 8, 14. 

Fricuet, M. (1937b). Recherches théoriques modernes. (Part of the Traité edited by Borel.) 
Gauthier-Villars, Paris. 

Frickry, E. (1937). The theory of index-number bias. Rev. Hcon. Stat., 19, 161. 

FRIEDMAN, M. (1937). The use of ranks to avoid the assumption of normality implicit in the 
analysis of variance. J. Am. Stat. Ass., 32, 675. 

FrrepMaAn, M. (1940). A comparison of alternative tests of significance for the problem of 
m rankings. Ann. Math. Stats., 11, 86. 

Friscu, R. (1926). Sur les semi-invariants et moments employés dans l’étude des distributions 
statistiques. Oslo, Skrifter af det Norske Videnshaps Academie, II, Hist.-Filos. Klasse, 
No. 3. 

Friscu, R. (1928). Changing harmonics and other general types of components in empirical 
series. Skand. Akt., 11, 220. 

Friscu, R. (1929). Correlation and scatter in statistical variables. Nordisk. Stat. J., 1, 36. 

Frisca, R. (1930). Necessary and sufficient conditions regarding the form of an index-number 
which shall meet certain of Fisher’s tests. J. Am. Stat. Ass., 25, 397. 

Faison, R. (1931). A method of decomposing an empirical series into its cyclical and progressive 
components. J. Am. Stat. Ass., 26, Supp. p. 73. 

Frtscu, R., and Mupcert, B. D. (1931). Statistical correlation and the theory of cluster types. 
J. Am. Stat. Ass., 26, 375. 

Friscu, R. (1932). On the use of difference equations in the study of frequency-distributions. 
Metron, 10, No. 3, 35. 

Faison, R. (1933). Propagation problems and impulse problems in dynamic economics. Economic 
Essays in honour of Gustav Cassel. London. 


464 BIBLIOGRAPHY 


Friscn, R. (19342). Robert Schmidt’s definition of skewness and kurtosis. Hconometrika, 
Dae, Paral : 

Friscu, R. (19345). Statistical confluence analysis by means of complete regression equations. 
Publication No. 5, Universitets Okonomiske Institut, Oslo. 

Friscu, R. (1936). Annual survey of general economic theory. The problem of index numbers. 
Econometrika, 4, 1. 

Friscu, R. (1938). On the inversion of moving averages. Skand. Akt., 21, 218. 

Fry, T. ©. (1928). Probability and its Engineering Uses. van Nostrand, New York. 

Fry, T. C. (1938). The y?-test of significance. J. Am. Stat. Ass., 33, 513. 


Gatton, Sir Francis (1886). Regression towards mediocrity in hereditary stature. J. Anthrop. 
Inst., 15, 246; and: Family likeness in stature. Proc. Roy. Soc., A, 40, 42. 

Gatton, Str Francis (1902). The most suitable proportion between the values of first and second 
prizes. Biom., 1, 385. 

Gavan, L. (1931). Contributi alla determinazione degli indici di variabilité per alcuni tipi di 
distribuzione. Metron, 9, No. 1, 3. 

GaLvant, L. (1932). Sulle curve di concentrazione relative a caratteri limitate e non limitate. 
Metron, 10, No. 3, 61. 

GARNER, R. (1932). Concerning the limits of a measure of skewness. Ann. Math. Stats., 3, 358. 

Garwoop, F. (1933). The probability integral of the correlation coefficient in samples from a 
normal bivariate population. Biom., 25, 71. 

Garwoop, F. (1936). Fiducial limits for the Poisson distribution. Biom., 28, 437. 

Garwoop, F. (1940). An application of the theory of probability to the operation of vehicular- 
controlled traffic signals. Supp. J.R.S.S., 7, 65. 

Garwoon, F. (1941). The application of maximum likelihood to dosage-mortality curves. Biom., 
32, 46. 

Geary, R. C. (1927). Some properties of correlation and regression in a limited universe. Metron, 
Yo IG, il, SBE 

Geary, R. C. (1930). The frequency distribution of the quotient of two normal variables. 
J.RSS., 93, 442. 

Geary, R. C. (1933). A general expression for the moments of certain symmetrical functions of 
normal samples. Biom., 25, 184. 

Geary, R. C. (1935a). The ratio of the mean deviation to the standard deviation as a test of 
normality. Biom., 27, 310. 

Geary, R. C. (19356). Note on the correlation between f, and w’. Biom., 27, 353. 

Geary, R. C. (1936a). Moments of the ratio of the mean deviation to the standard deviation for 
normal samples. Biom., 28, 295. 

Geary, R. C. (1936b). The distribution of ‘Student’s’ ratio for non-normal samples. Supp. 
J wg oy Lise 

Geary, R. C., and Pearson, E. S. (1938). Tests of Normality. Biometrika Office, London. 

Geary, R. C. (1942). The estimation of many parameters. J.R.S.S., 105, 213. 

Geary, R. C. (1943). Minimum range for quasi-normal distributions. Biom., 33, 100. 

Gmary, R. C. (1944). Comparison of the concepts of efficiency and closeness for consistent esti- 
mates of a parameter. Biom., 33, 123. 

GBHLKE, C. E., and Brean, K. (1934). Certain effects of grouping upon the size of the correlation 
coefficient in census tract material. J. Am. Stat. Ass., 29, Supp., 169. 

GEIRINGER, H. (1933). Korrelationsmessung auf Grund der Summenfunktion. Zeit. ang. Math. 
und Mech., 13, 121. 

GEIRINGER, H. (1934). Une méthode générale de statistique théorique. Comptes rendus, 198, 
420; and: Applications. Jbid., 198, 696. 

Gerrincer, H. (1938). On the probability theory of arbitrarily linked events. Ann. Math. 
Stats., 9, 260 (and Errata, 10, 202). 


BIBLIOGRAPHY 465 


GEIRINGER, H. (1942). A new explanation of non-normal dispersion in the Lexis theory. 
Econometrika, 10, 53. 

Gini, C. (1912). Variabilita e Mutabilita, contributo allo studio delle distribuzioni e relazioni 
statistiche. Studi Economico-Giuridici delle R. Universita di Cagliari. 

GinI, C. (1916). Indici di concordanza. Atti R. Ist. Veneto di Sci. Lett. ed Arte. 

Grint, C. (1921). Sull’ interpolazione di una retta quando i valori della variabile indipendente 
sono affetti da errori accidentali. Metron, 1, No. 3, 63. 

Gint, C., and Gatvant, L. (1929). Di talune estensioni del concetto di media ai caratteri quali- 
tativi. Metron, 8, Nos. 1-2, 3. 

Gint, C. (1930). Sul massimo degli indici di variabilité assoluta, etc. Metron, 8, No. 3, 3. 

Gint, C. (1932). Intorno alle ourve di concentrazione. Metron, 9, Nos. 3-4, 3. 

Grint, ©. (1938). Di una formola comprensiva delle medie. Metron, 13, No. 2, 3. 

GinI, C., and Zappa, G. (1938). Sulle proprieté delle medie potenziate e combinatorie. Metron, 
13, Mo, 3,02ik 

Gint, C. (1939). Sulla determinazione dell’ indice di cograduazione. Metron, 13, No. 4, 41. 

GirsHik, M. A. (1936). Principal components. J. Am. Stat. Ass., 31, 519. 

GirsHik, M. A. (1939). On the sampling theory of the roots of determinantal equations. Ann. 
Math. Stats., 10, 203. 

Girsuik, M. A. (1942). Note on the distribution of roots of a polynomial with random complex 
coefficients. Ann. Math. Stats., 13, 235; Correction, ibid., 13, 447. 

GLIVENKO, V. (1933). Sulla determinazione empirica delle leggi di probabilité. Géorn. Ist. 
Ital. Att., 4, 92. 

GLIVENKO, V. (1936). Sul teorema limite della teoria detle funzioni caratteristiche. Giorn. 
Ist. Ital. Att., 7, 160. 

GNEDENKO, B. (1938), Uber die Konvergenz der Verteilungsgesetze von Summen voneinander 
unabhéngiger Summanden. O.R. Acad. Sci. U.S.S.R., 18, 231. 

Gontn, H. T. (1936). The use of factorial moments in the treatment of the hypergeometric dis- 
tribution and in tests for regression. Phil. Mag., (7), 21, 215. 

Gorpon, R. A. (1937). A selected bibliography of the literature of economic fluctuations. Rev. 
Econ. Stat., 19, 37. 

Gorpon, R. D. (1939). Estimating bacterial populations by the dilution method. Biom., 31, 
167. 

Gorpon, R. D. (1941). The estimation of a quotient when the denominator is normally dis- 
tributed. Ann. Math. Stats., 12, 115. 

Gotaas, P. (1936). Formules de récurrence pour les semi-invariants 4 quelques lois de distri- 
bution 4 plusieurs variables. Comptes rendus, 202, 619. 

GouLpEN, C. H. (1937). Efficiency in field trials of pseudo-factorial and incomplete randomised 
block methods. Canadian J. Res., 15, 231. 

GouLpen, C. H. (1938). Modern methods for testing a large number of varieties. Dom. Canada 
Dep. Agr. Tech. Bull., 9. 

GoutpEen, C. H. (1939). Methods of Statistical Analysis. John Wiley and Sons, New York. 
(Chapman and Hall, London.) 

Gram, J. P. (1879). Om Rackkendviklinger bestemte ved Hjaelp af de mindste Kvadraters Methode, 
Copenhagen. Reprinted as Uber die Entwicklung realer Funktionen in Reihen mittelst 
der Methode der kleinsten Quadraten. J. fiir Math., 94, 41, 1894. 

Greenctesr, H. E. H. (1932). Curve approximation by means of functicns analogous to the 
Hermite polynomials. Ann. Math. Stats., 3, 204. 

GREENSTELN, B. (1935). Periodogram analysis with special application to business failures in 
the United States. Hconometrika, 3, 170. 

GreENwoop, J. A., and Sruart, C. E. (1937). Mathematical techniques used in extra-sensory 
perception research. J. Parapsychology, 1, 206. 

GreEnwoop, J. A. (1938). Variance of a general matching problem. Ann. Math. Stats., 9, 56. 

A.S.—VOL. I. HH 


466 BIBLIOGRAPHY 


GreEnwoop, J. A., and Grevitte, T. N. E. (1939). On the probability of attaining a given 
standard deviation ratio in an infinite series of trials. Ann. Math. Stats., 10, 297. 

Greenwoop, J. A. (1940). The first four moments of a general matching problem. Ann. Hug. 
Lond., 10, 290. 

GREENWOOD, M., and Yue, G. U. (1915). The statistics of anti-typhoid and anti-cholera inocula- 
tions, and the interpretation of such statistics in general. Proc. Roy. Soc. Medicine, 
8, 113. 

GreEnwoop, M., and Yue, G. U. (1917). On the statistical interpretation of some bacteriological 
methods employed in water analysis. J. Hygiene, 21, 36. 

Greenwoop, M., and Yu.e, G. U. (1920). An inquiry into the nature of frequency-distributions 
of multiple happenings, etc. J.R.S.S., 83, 255. 

Greenwoop, M. (1922). The value of life tables in statistical research. J.R.S.S., 85, 537. 

Gressens, O. (1925). On the measurement of seasonal variations. J. Am. Stat. Ass., 20, 203. 

Grevitte, T. N. E. (1938). Exact probabilities for the matching hypothesis. J. Parapsychology, 
25 50. 

GREVILLE, T. N. E. (1939). Invariance of the admissibility of numbers under certain general 
types of transformations. TZ'rans. Am. Math. Soc., 46, 410. 

Grevittz, T. N. E. (1941). The frequency-distribution of a general matching problem. Ann. 
Math. Stats., 12, 350. 

Gruneserc, H., and Harpang, J. B. 8. (1937). Tests of goodness of fit applied to records of 
Mendelian segregation in mice. Biom., 29, 144. 

GuLpBerG, A. (1922). Sur un théoréme de M. Markoff. Comptes rendus, 175, 679. 

GuLDBERG, A. (1934). On discontinuous frequency functions of two variables. Skand. Akt., 


I ee 

GutpBerc, A. (1935). Sur les lois de probabilités et la corrélation. Ann. Inst. H. Poincaré, 
Sy lou: 

GULDBERG, S. (1935). Sui momenti della legge di distribuzioni del Pélya. Giorn. Ist. Ital. Att., 
6, 394. 


Gutorra, B. (1938). Sulla legge di probabilita della differenza tra la media empirica e il valore 
medio teorico dei quadrati d’ una variabile casuale che segue la legge normale. Giorn. 
Ist. Ital. Ait., 9, 245. 

GumBeEL, EH. J. (1924). Eine Darstellung der Sterbetafel. Biom., 16, 283 (and Correction, ibid., 
411). 

GumBEL, E. J. (1925). Lebenserwartung und mittleres Alter der Lebenden. Biom., 17, 173. 

GuMBEL, E. J. (1932). La distribuzione dei decessi secondo la legge di Gauss. Giorn. Ist. Ital. 
Att., 3, 311. 

GumBEL, E. J. (1934). Les valeurs extrémes des distributions statistiques. Ann. Inst. H. Poincaré, 
Be Llp: 

GUMBEL, E. J. (1935a). Les m-iémes valeurs extrémes et le logarithme du nombre d’observations. 
Comptes rendus, 200, 509. 

GumMBEL, E. J. (19356). Le plus grand age, distribution et série. Comptes rendus, 201, 318. 

GumpeL, E. J. (1937). La durée extréme de la vie humaine. Actualités Scientifiques et Indus- 
trielles, No. 520. Paris. Hermann et Cie. 

GuMBEL, EH. J. (1938a). La prévision des inondations. Comptes rendus, 206, 558; and: La 
distribution des inondations, Akt. Vedy Roc., 7, 85. 

GuMBEL, E. J. (19380). Gli eventi compatibili. Giorn. Ist. Ital. Att., 9, 3 and 58. 

GuMBEL, E. J. (1939). Les valeurs de position d’une variable aléatoire. Comptes rendus, 208, 
149. 

GumBEL, E. J. (1941). The return period of flood flows. Ann. Math. Stats., 12, 163. 

GuMBEL, E. J. (1942). Simple tests for given hypotheses. Biom., 32, 317. 

GumpBeL, E. J. (1943a). On serial numbers. Ann. Math. Stats., 14, 163. 

GuMBEL, E. J. (1943). On the reliability of the classical z?-test. Ann. Math. Stats., 14, 253. 


BIBLIOGRAPHY 467 


Haavetnmo, T. (1941). A note on the variate-difference method. Econometrika, 9, 74. 

HABERLER, G. (1927). Der Sinn der Indexzahlen. Mohr, Tiibingen. 

HapaMarD, J., and FrRf&cuet, M. (1933). Sur les probabilités discontinues des événements en 
chaine. Zeit. ang. Math. und Mech., 13, 92. 

Haxipang, J. B. 8. (1937). The exact value of the moments of the distribution of y?, used as a 
test of goodness of fit, when expectations are small. Biom., 29, 133. 

Haupane, J. B. 8. (1938, 1939, 1940). The first six moments of y? for an n-fold table with 
nm degrees of freedom when some expectations are small. Biom., 29, 389; The mean and 
variance of y? when used as a test of homogeneity when samples are small. Biom., 31, 
346; The cumulants and moments of the binomial distribution and the cumulants of 
y* for an n X 2-fold table. Biom., 31, 392; Corrections to formulae in papers on the 
moments of y?. Biom., 31, 220. 

Haupane, J. B. S. (1938). The approximate normalisation of a class of frequency-distributions. 
Biom., 29, 392. 

Haupane, J. B. 8. (1941). The cumulants of the distribution of the square of a variate. Biom., 
32, 199. 

Hatpane, J. B. 8. (1942a). Moments of the distributions of powers and products of normal 
variates. Biom., 32, 226. 

Haupane, J. B.S. (19426). The mode and median of a nearly normal distribution with given 
cumulants. Biom., 32, 294. 

Hatt, P. (1927a). Multiple and partial correlation coefficients in the case of an n-fold variate 
system. Biom., 19, 100. 

Hatt, P. (19276). The distribution of means for samples of size N drawn from a population in 
which the variate takes values between 0 and I, all such values being equally probable. 
Biom., 19, 240. 

HatrHen, 1. (1939). Sur la convergence des estimations. Comptes rendus, 208, 708. 

Hampurcer, H. (1920, 1921). Uber eine Erweiterung des Stieltjesschen Momentproblems. Math. 
Ann., 81, 235; 82, 120 and 168. 

Hansen, M. H., and Hurwitz, W. N. (1943). On the theory of sampling from finite populations. 
Ann. Math. Stats., 14, 333. 

Hansmann, G. H. (1934). On certain non-normal symmetrical frequency-distributions. Biom., 
26, 129. 

Harris, J. A. (1914). On the calculation of intra-class and inter-class coefficients of correlation 
from class-moments when the number of possible combinations is large. Biom., 
9, 446. ; 

Harris, J. A., and GunstaD, B. (1931). Extension of Pearson’s correlation method to intra- 
class and inter-class relationships. J. Agr. Sci., 42, 279. 

Harris, J. A., and TreLoar, A. E. (1927). Ona limitation in the applicability of the contingency 
coefficient. J. Am. Stat. Ass., 22,460; and: Harris and Car Tu. A second category 
of limitations in the applicability of the contingency coefficient. Jbid., 24, 367. (Reply 
by K. Pearson, J. Am. Stat. Ass., 25, 320.) 

Hart, B. I. (1942). Significance levels for the ratio of the mean square successive difference to 
the variance. Ann. Math. Stats., 13, 445. 

Hartiey, H. O. (1938). Studentisation and large sample theory. Supp. J.RS.S., 5, 80. 

Hartuzy, H. O. (1940). Testing the homogeneity of a set of variances. Biom., 31, 249. 

Hartiey, H. O. (1942). The range in normal samples. Biom., 32, 334. 

Hartiey, H. O. (1944). Studentization, or the elimination of the standard deviation of the 
parent population from the random sample-distribution of statistics. Biom., 33, 173. 

Hartman, P., vAN Kampen, E. R., and Wintyer, A. (1937). Mean motions and distribution 
functions. Am. J. Maths., 59, 261. 

Hartman, P., van Kamren, E. R., and Winter, A. (1938). On the distribution functions of 
almost periodic functions. Am. J. Maths., 60, 491. 


468 BIBLIOGRAPHY 


Hartman, P., van Kampen, E. R., and Wintner, A. (1939). Asymptotic distributions and 
statistical independence. Am. J. Maths., 61, 477. : 

Harzur, P. (1933). Tabellen fiir alle Statistischen Zwecke. Abhandlungen des Bayerischen Ak. 
der Wiss., Math-naturwiss. Abteilung, Neue Folge, Heft 21. 

Haussporr, F. (1923). Momentprobleme fiir ein endliches Intervall. Math. Zeit., 16, 220. 

Havitann, E. K. (1934a). On the theory of absolutely additive distribution functions. Am. J. 
Maths., 56, 625. 

Havinann, E. K. (19345). On distribution functions and their Laplace-Fourier transform. Proc. 
Nat. Acad. Sci., 20, 50; and (with A. WrytverR): On the Fourier-Stieltjes transform. 
Am. J. Maths., 56, 1. 

Havinann, K. K. (1935). On the inversion formula for Fourier-Stieltjes transforms in more than 
one dimension. Am. J. Maths., 57, 94, and 57, 382. Also: Note, 57, 569. 
Havinann, E. K. (1935, 1936). On the moment problem for distribution functions in more 

than one dimension. Am. J. Maths., 57, 562, and 58, 164. 

Havmanp, E. K. (1939). Asymptotic probability distributions and harmonic curves. Am. J. 
Maths., 61, 947. 

HELGUERO, F. (1906). Per !a risoluzione delle curve dimorfiche. Rend. R. Acad. Linc., 6. 

Hevmert, F. R. (1875). Uber die Berechnung des wahrscheinlichen Fehlers aus einer endlichen 
Anzahl wahrer Beobachtungsfehler. Zet. fiir Math. und Phys., 20, 300. 

Hetmert, F. R. (187¢a). Uber die Wahrscheinlichkeit der Potenzsummen und iiber einige damit 
in Zusammenhange stehende Fragen. Zeit. fiir Math. und Phys., 21, 192. 

Hetmert, F. R. (1876b). Die Genauigkeit der Formel von Peters zur Berechnung des wahr- 
scheinlichen Beobachtungsfehlers dirckter Beobachtungen gleicher Genauigkeit. 
Astronomische Nachrichten, 88, No. 2096. 

HENDERSON, J. (1922). On expansions in tetrachoric functions. Biom., 14, 157. 

Henperson, R. (1907). Frequency curves and moments. J. Inst. Act., 41, 429. 

Henvrioxcs, W. A. (1931). The use of the relative residual in the application of the method of 
least squares. Ann. Math. Stats., 2, 458. 

Henpricks, W. A. (1934). The standard error of any analytic function of a set of parameters 
evaluated by the method of least squares. Ann. Math. Stats., 5, 107. 

Henpricss, W. A. (1935). The analysis of variance considered as an application of simple error 
theory. Ann. Math. Stats., 6, 117. 

Henpricks, W. A. (1936). An approximation to ‘ Student’s’ distribution. Ann. Math. Stats., 
i, 210: 

Henpricks, W. A., and Rosey, K. W. (1936). The sampling distribution of the coefficient of 
variation. Ann. Math. Stats., '7, 129. 

Herscu, L. (1934). Essai sur les variations périodiques et leur mensuration. Metron, 12, No. 1, 3. 

Hey, G. B. (1938). A new method of experimental sampling illustrated on certain non-normal 
populations. Biom., 30, 68. 

HitprsranptT, EK. H. (1931). Systems of polynomials connected with the Charlier expansions 
and the Pearson differential equations. Ann. Math. Stats., 2, 379. 

Hitton, J. (1924, 1928). Enquiry by sample; an experiment and its results. J.R.S.S., 87, 
544; and: Some further enquiries by sample. Jbid., 91, 519. 

Hriescureip, H. O. (1935). A connection between correlation and contingency. Proc. Camb. 
Phil. Soc., 31, 520. 

Hirscureip, H. O. (1937). The distribution of the ratio of covariance estimates in two samples 
drawn from normal bivariate populations. Biom., 29, 65. 

Hogx, P. G. (1937). A significance test for component analysis. Ann. Math. Stats., 8, 149. 

Host, P. G. (1938). On the chi-square distribution for small samples. Ann. Math. Stats., 9, 158. 

Hoex, P. G. (1939). A significance test for minimum rank in factor analysis. Psychometrika, 
4, 245. 

Host, P. G. (1941). On methods of solving normal equations. Ann. Math. Stats., 12, 354. 


BIBLIOGRAPHY 469 


Hogo, T. (1931, 1933). Distribution of the median, quartiles and interquartile distance in samples 
from anormal population. Biom., 23,315; and: A further note on the relation between 
the median and the quartiles in small samples from a normal population. Biom., 25, 79. 

Houzincer, K. J., and Cuurcu, A. E. R. (1929). On the means of samples from a U-shaped 
population. Biom., 20A, 361. 

Horst, P. (1935). A method for determining the coefficients of a characteristic equation. Ann. 
Math. Stats., 6, 83. 

Hostinsky, B. (1937). Sur les probabilités relatives aux variables aléatoires liées entre elles. 
Applications diverses. Ann. Inst. H. Poincaré, 7, 69. 

Horeviine, H. (1925). The distribution of correlation ratios calculated from random data. 
Proc. Nat. Acad. Sci., 11, 657. 

Horteviine, H. (1927). An application of analysis situs to statistics. Bull. Am. Math. Soc., 
July-Aug., 467. 

Horexine, H. (1930). The consistency and ultimate distribution of optimum statistics. Trans. 
Am. Math. Soc., 32, 847. 

Horetiine, H. (1931). The generalisation of ‘Student’s’ ratio. Ann. Math. Stats., 2, 360. 

Horexuine, H. (1933). Analysis of a complex of statistical variables into principal components. 
Reprinted from J. Educ. Psych. (24, 417), Sept.-Oct. 1933, Warwick and York, Inc., 


Baltimore. 
Horteutine, H. (1936a). Simplified calculation of principal components. Psychometrika, 
ee 


Hote ture, H. (19365). Relations between two sets of variates. Biom., 28, 321. 

Hore une, H., and Passt, M. R. (1936c). Rank correlation and tests of significance involving 
no assumptions of normality. Ann. Math. Stats., 7, 29. 

Horetiine, H., and Franke, L. R. (1938). The transformation of statistics to simplify their 
distribution. Ann. Math. Siats., 9, 87. 

Hote xine, H. (1939). Tubes and spheres in n-spaces and a class of statistical problems. Am. J. 
Maths., 61, 440. 

Horeiine, H. (1940). The selection of variates for use in prediction with some comments on the 
problem of nuisance parameters. Ann. Math. Stats., 11, 271. 

Horeiine, H. (1941). Experimental determination of the maximum of a function. Ann. 
Math. Stats., 12, 20. 

Horeniine, H. (1943). Some new methods in matrix calculation. Ann. Math. Stats., 14, 1 
and 440. 

Hsu, C. T., and Lawtsy, D. N. (1939). The derivation of the fifth and sixth moments of b, in 
samples from a normal population. Biom., 31, 238. 

Hsv, C. T. (1940, 1941). On samples from a normal bivariate population. Ann. Math. Stats., 
11, 410; and: Samples from two bivariate normal populations. Jbid., 12, 279. 

Hsv, P. L. (19382). Contribution to the theory of ‘ Student’s ’ t-test as applied to the problem 
of two samples. Stat. Res. Mem., 2, 1. 

Hsv, P. L. (19385). On the best unbiassed quadratic estimate of the variance. Stat. Res. Mem., 
25 el. 

Hsv, P. L. (1938c). Notes on Hotelling’s generalised T. Ann. Math. Stats., 9, 231. 

Hsv, P. L. (1939a). A new proof of the joint product-moment distribution. Proc. Camb. Phil. 
Soc., 35, 336. 

Hsv, P. L. (19395). On the distribution of roots of certain determinantal equations. Ann. 
Bug. Lond., 9, 250. 

Hsv, P. L. (1940). On generalised analysis of variance. Biom., 31, 221. 

Hsv, P. L. (1941a). On the limiting distribution of the canonical correlations. Biom., 32, 38. 

Hsu, P. L. (19416). Analysis of variance from the power function standpoint. Biom., 32, 62. 

Hsu, P. L. (1941c). On the problem of rank and the limiting distribution of Fisher’s test function. 
Ann. Eug. Lond., 11, 39. 


470 BIBLIOGRAPHY 


Hsv, P. L. (1941d). Canonical reduction of the general regression problem. Ann. Hug. Lond., 
11, 42. ‘ 

Hsv, P. L. (1943). Some simple facts about the separation of degrees of freedom in factorial 
experiments. Sankhyd, 6, 253. 


Immer, F. R. (1937). Correlation between means and standard deviations in field experiments. 
J. Am. Stat. Ass., 32, 525. 

Incuam, A. E. (1933). An integral which occurs in statistics. Proc. Camb. Phil. Soc., 29, 270. 

Irwin, J. O. (1925a). The further theory of Francis Galton’s individual difference problem. 
Biom., 17, 100. 

Irwin, J. O. (1925). On a criterion for the rejection of outlying observations. Biom., 17, 238. 

Irwin, J. O. (1927, 1929). On the frequency-distribution of the means of samples from a popula- 
tion having any law of frequency with finite moments, etc. Biom., 19, 225, and 21, 431. 

Irwin, J. O. (19292). On the frequency-distribution of any number of deviates from the mean 
of a sample from a normal population and the partial correlations between them. 
J. 4 92, 580: 

Irwin, J. O. (19296). Note on the y?-test for goodness of fit. J.R.S.S., 92, 274. 

Irwin, J. O. (1930). On the frequency-distribution of the means of samples from populations 
of certain of Pearson’s types. Metron, 8, No. 4, 51. 

Irwin, J. O. (1931). Mathematical theorems involved in the analysis of variance. J.R.SS., 
94, 284. 

Irwin, J. O. (1933). A critical discussion of the single-factor theory. Brit. J. Psych., 23, 371. 

Irwin, J. O. (1934). On the independence of the constituent items in the analysis of variance. 
Supp. J.RSS., 1, 236. 

Irwin, J. O. (1935). Tests of significance for differences between percentages based on small 
numbers. Metron, 12, No. 2, 83. 

Irwin, J. O. (1937a). The frequency-distribution of the difference between two independent 
variates following the same Poisson distribution. J.R.S.S., 100, 415. 

Trawin, J. O. (19376). Statistical method applied to biological assays. Supp. J.R.SS., 4, 1. 

Irwin, J. O., and CHEESEMAN, E. A. (1939). On the maximum-likelihood method of deter- 
mining dosage response curves. Supp. J.R.S.S., 6, 174. — 

Irwin, J. O. (1942). On the distribution of a weighted estimate of variance and on analysis of 
variance in certain cases of unequal weighting. J.R.S.S., 105, 115. 

Irwin, J. O., and Kenpbatt, M. G. (1944). Sampling moments of moments for a finite population. 
Ann. Eug. Lond., 12, 138. 

IssERuis, L. (1914, 1916). On the partial correlation ratio. Part I. Theoretical. Biom., 
10, 391; and Part II. Numerical. Jbid., 11, 50. 

Isseruis, L. (1915). On the conditions under which the probable errors of frequency-distri- 
butions have a real significance. Proc. Roy. Soc., A, 92, 23. (Correction, Biom., 12, 261.) 

IssERuIs, L. (1916). On certain probable errors and correlation coefficients of multiple frequency- 
distributions with skew regression. Biom., 11, 185. 

Isseruis, L. (1917). On the representation of statistical data. Biom., 11, 418. 

Isseruis, L. (1918a). On the value of a mean as calculated from a sample. J.R.S.S., 81, 75. 

IssERLIs, L. (1918)). On a formula for the product-moment coefficient of any order of a normal 
frequency-distribution in any number of variables. Biom., 12, 134. (Correction, 
ibid., 12, 266.) 

IssERtiIs, L. (1918c). Formulae for determining the mean values of products of deviations of 
mixed moment coefficients in two to eight variables in samples taken from a limited 
population. Biom., 12, 183. 

ISSERLIS, L. (1931). On the moment distributions of moments in the case of samples drawn from 
a limited universe. Proc. Roy. Soc., A, 132, 586. 

IssERLIs, L. (1936). Inverse probability. J.R.S.S., 99, 130. 


BIBLIOGRAPHY 471 


Jackson, D. (1921). Note on the median of a set of numbers. Bull. Am. Math. Soc., 27, 
160. 

Jackson, D. (1934). Series of orthogonal polynomials. Ann. Math., 34, 527; and: The sum- 
mation of series of orthogonal polynomials. Bull. Am. Math. Soc., 40, 743. 

Jackson, D. (1937). Orthogonal polynomials on a plane curve. Duke Math. J., 3, 228. 

Jackson, D. (1938). Orthogonal polynomials in three variables. Duke Math. J., 4, 441. 

Jackson, R. W. (1936). Tests of statistical hypotheses in the case when the set of alternatives 

. is discontinuous, illustrated on some genetical problems. Stat. Res. Mem., 1, 138. 

Jacos, M. (1933). Sullo sviluppo di una curva di frequenza in serie di Charlier Type B. Gorn. 
NG, {ual Biting GS BOA 

Jacos, M. (1935, 1937). Sul fenomeno di Gibbs nello sviluppo in serie di polinomi di Hermite. 
Giorn. Ist. Ital. Att., 6, 1, and 8, 297. 

JEFFREYS, H. (1933). On Gauss’s proof of the law of errors. Proc. Camb. Phil. Soc., 29, 
eal. 

JEFFREYS, H. (1937a). On statistically steady distributions in Astronomy. Monthly Not. R. 
Astr. Soc., 97, 59. 

JEFFREYS, H. (19375). On the relation between direct and inverse methods in statistics. Proc. 
Roy. Soc., A, 160, 325. 

JEFFREYS, H. (1937c). The law of errors and the combination of observations. Phil. Trans., 
A, 257 7e2ak 

JEFFREYS, H. (1938a). Significance tests for continuous departures from suggested distributions 
of chance. Proc. Roy. Soc., A, 164, 307. 

JEFFREYS, H. (19385). The use of minimum y? as an approximation to the method of maximum 
likelihood. Proc. Camb. Phil. Soc., 34, 156. 

JEFFREYS, H. (1938c). Maximum likelihood, inverse probability and the method of moments. 
Ann. Eug. Lond., 8, 146. 

JEFFREYS, H. (1938d). The correction of frequencies for a known standard error of observations. 
Month. Not. R. Asir. Soc., 98, 190. 

JEFFREYS, H. (1939a). The Theory of Probability. Cambridge University Press. 

JEFFREYS, H. (19396). The minimum y? approximation. Proc. Camb. Phil. Soc., 35, 520. 

JEFFREYS, H. (1939c). The posterior probability distributions of the ordinary and intra-class 
correlation coefficients. Proc. Roy. Soc., A, 167, 464. 

JEFFREYS, H. (1939d). The comparison of series of measures on different hypotheses concerning 
the standard errors. Proc. Roy. Soc., A, 167, 367. 

JEFFREYS, H. (1939c). Random and systematic arrangements. Biom., 31, 1. 

JEFFREYS, H. (1940). Note on the Behrens-Fisher formula. Ann. Hug. Lond., 10, 48. 

JEFFREYS, H. (1941). Some applications of the method of minimum y*. Ann. Hug. Lond., 
11, 108. 

JENKINS, T. N. (1932). A short method and tables for the calculation of the average and standard 
deviation of logarithmic distributions. Ann. Math. Stats., 3, 45. 

JENNETT, W. J., and Wetcu, B. L. (1939). The control of proportion defective as judged by a 
single quality characteristic varying on a continuous scale. Supp. J.R.SS., 6, 80. 

JENSEN, A. (1925). Report on the representative method in statistics. Bull. Int. Stat. Inst., 
22, It livre. 

Jessen, B., and Wintner, A. (1935). Distribution functions and the Riemann zeta-function. 
Trans. Am. Math. Soc., 38, 48. 

Jounson, E. (1940). Estimates of parameters by means of least squares. Ann. Math. Stats., 
11, 453. 

Jounson, N. L., and Wetcn, B. L. (1939). On the calculation of the cumulants of the y-distri- 
bution. Biom., 31, 216. 

Jounson, N. L., and Wetcu, B. L. (1940a). Applications of the non-central ¢-distribution. Biom., 
31, 362. 


472 BIBLIOGRAPHY 


Jounson, N. L. (19405). Parabolic test for linkage. Ann. Math. Stats., 11, 227. 

Jounson, P. O., and Neymay, J. (1936). Tests of certain linear hypotheses and their application 
to some educational problems. Stat. Res. Mem., 1, 57. 

Jones, H. E. (1937a). Some geometrical considerations in the general theory of fitting lines and 
planes. Metron, 13, No. 1, 21. 

Jonrs, H. E. (19376). The nature of regression functions in the correlation analysis of time- 
series. Hconometrika, 5, 305. 

Jones, H. E. (1937c). The theory of runs as applied to time series. Report, Third Annual 
Research Conf. on Economics and Statistics, p. 83. (Cowles Commission.) 

JORDAN, ©. (1927). Statistique Mathématique. Gauthier-Villars, Paris. 

JORDAN, C. (1932). Approximation and graduation according to the principle of least squares 
by orthogonal polynomials. Ann. Math. Stats., 3, 257. 

Jorpan, C. (1933). Inversione della formola di Bernouilli relativa al problema delle prove ripetute 
a pit variabili. Gorn. Ist. Ital. Att., 4, 505. 

JorDAN, C. (1934). Teoria della perequazione e dell’ approssimazione. Giorn. Ist. Ital. Att., 
Suede 

JORGENSEN, N. R. (1916). Undersdgelser over Frequensflader og Korrelation. Busck, Copenhagen. 


Kac, M. (1939). On a characterisation of the normal distribution. Am. J. Maths., 61, 726. 

Kao, M., and van Kampen, E. R. (1939). Circular equidistributions and statistical independence. 
Am. J. Maths., 61, 677. 

Katecxi, M. (1935). A macrodynamic theory of business cycles. Hconometrika, 3, 327. 

Kame, E. (1932). Hinfiihrung in die Wahrscheinlichkettstheorte. Hirzel, Leipzig. 

Kapiansky, J. (1939). Ona generalisation of the ‘ probléme de rencontres ’. Am. Math. Monthly, 
46, 159. 

Kapreyn, J. C. (1903). Skew Frequency-Curves in Biology and Statistics. Noordhoff, Groningen 
and Wm. Dawson, London. 

Kauvcry, J. (1936). Le probléme des itérations dans un cas de probabilités dépendantes. Comptes 
rendus, 202, 722. 

Ketuey, T. L. (1923). Statistical Method. Macmillan, New York. 

Kewey, T. L. (1928). Cross-roads in the Mind of Man. Stanford University Press, California. 

Ketiey, T. L., and McoNemar, Q. (1929). Doolittle versus Kelley—-Salisbury iteration method for 
computing multiple regression coefficients. J. Am. Stat. Ass., 24, 164. 

Kewiry, T.L. (19385). Anunbiassed correlation ratio measure. Proc. Nat. Acad. Sci., 21, 554. 

Ketiey, T. L. (1938). The Kelley Statistical Tables. Macmillan, New York. 

KENDALL, M. G. (1938a). The conditions under which Sheppard’s corrections are valid. J.R.S.S., 
101, 592. 

KENDALL, M. G. (19385). A new measure of rank correlation. Biom., 30, 81. 

Kenpati, M. G., KENDALL, S. F. H., and Basryeton Smita, B. (1939). The distribution of 
Spearman’s coefficient of rank correlation, etc. Biom., 30, 251. 

Kernpati, M. G., and Basineton Sir, B. (19392). Tables of Random Sampling Numbers. 
Tracts for Computers, No. 24, Cambridge University Press. 

Kenpatu, M. G., and Basineton Situ, B. (19396). The problem of m rankings. Ann. Math. 
Stats., 10, 275. 

KeEnpatL, M. G., and Basrneton Smits, B. (1940). On the method of paired comparisons. 
Biom., 31, 324. 

KeEnpaLL, M. G. (1940). Some properties of k-statistics. Ann: Eug. Lond., 10, 106; Proof of 
Fisher’s rules for ascertaining the sampling semi-invariants of k-statistics. Jbid., 10, 
215; The derivation of multivariate sampling formulae from univariate formulae by 
symbolic operation. Jbid., 10, 392. 

Kenpath, M. G. (1941). A theory of randomness. Biom., 32, 1. 

KeEnpatu, M. G. (1942a). Partial rank correlation. Biom., 32, 277, 


BIBLIOGRAPHY 473 


KeEnpDaty, M. G. (19426). On seminvariant statistics. Ann. Eug. Lond., 11, 300. 

KENDALL, M. G. (1944). Oscillatory movements in English agriculture. J.R.S.S., 106, 91. 

Kenpat, M. G. (19446). On autoregressive time-series. Biom., 33, 105. 

Kercuner, R., and Wintner, A. (1936). On the asymptotic distribution of almost periodic 
functions with linearly independent frequencies. Am. J. Maths., 58, 91. 

Kermack, W. O., and McKenpricg, A. G. (1936). Tests for randomness in a series of numerical 
observations. Proc. Roy. Soc. Edin., 57, 228. 

Kermacg, W. O., and McKenprick, A. G. (1937). Some distributions associated with a randomly 
arranged set of numbers. Proc. Roy. Soc. Edin., 57, 332. 

Kerricu, J. E. (1935). Systems of osculating arcs. J. Inst. Act., 66, 88. 

Kerricu, J. E. (1937). Least squares and a generalisation of the ‘Student "Fisher theorem. 
Skand. Akt., 20, 244. 

Kryrirz, N. (1938). Graduation by a truncated normal. Ann. Math. Stats., 9, 66. 

Keynes, J. M. (1911). Principal averages and the laws of error which lead to them. J.BR.S.S., 
74, 322. 

Keynes, J. M. (1921). A Treatise on Probability. Macmillan, London. 

Kartcutne, A. (1928). Begriindung der Normalkorrelation nach der Lindebergschen Methode. 
Nachr. Forschungsinst. Moskau, 1. 

Karytoutne, A. (1932-1933). Sulle successione stazionarie di eventi. Giorn. Ital. Ist. Att., 3, 
267; and Uber stationire Reihen zufalliger Variablen. Kec. Mathématiques, Moscou, 
40. 

KaintcHine, A. (1933). Asymptotische Gesetze der Wahrscheinlichkeitsrechnung. Springer, 
Berlin. 

Kauintoutne, A. (1934). Korrelationstheorie der stationire stochastischer Prozesse. Math. 
Ann., 109, 604. 

Kurytouine, A. (1935). Sul dominio di attrazione deila legge di Gauss. Giorn. Ist. Ital. Att., 
6, 378. 

Karntcutine, A., and Lévy, P. (1936). Sur les lois stables. Comptes rendus, 202, 374. 

Kuintouine, A. (1937a). Zur Theorie der unbeschrankt teilbaren Verteilungsgesetze. ec. 
Math. Moscou, 2, 79. 

Katntourne, A. (19376). Series of papers on probability laws in Bull. Univ. Etat Moscou, Sér. 
Ini. Sect. A, 1, Fase. 1, 1,6; Fasc. 5, 1, 4, 6. 

Kuartntcutnes, A. (1938). Zwei Satze iiber stochastische Prozesse mit stabilen Verteilungen. 
Rec. Math. Moscou, 3, 577. 

Kripsue, W. F. (1941). A two-variate gamma type distribution. Sankhyd, 5, 137. 

Kusir, C. V. (1934). Pitfalls in sampling for population study. J. Am. Stat. Ass., 29, 250. 

Kisuen, K. (1940). On a simplified method of expressing the components of the second-order 
interaction in a 3? factorial design. Sankhya, 4, 577. 

Kisuen, K. (1942). Symmetrical unequal block arrangements. Sankhyd, 5, 329. 

Kiracawa, T. (1941). The limit theorems of the stochastic contagious processes. Mem. Fac. 
Sci., Kyusyu Imperial University, A, 1, 167. 7 

Kotmoacororr, A. (1929). Bemerkungen zu meiner Arbeit ‘ Uber die Summen zufalliger Gréssen.’ 

_ Math. Ann., 102, 484. 

Kotmogororr, A. (1931). Uber die analytische Methode in der Wahrscheinlichkeitsrechnung. 
Math. Ann., 104, 415. 

Kotmocororr, A. (1933a). Grundbegriffe der Wahrscheinlichkeitsrechnung, Berlin. 

Kotmogororr, A. (19336). Sulla determinazione empirica delle leggi di probabilita. Giorn. 
Ist. Ital. Altt., 4, 83. 

Kotmoaororr, A. (1937a). Zur Umkehrbarkeit der statistischen Naturgesetz. Math. Ann., 
113, 766. 

Kotmogororr, A. (19370). Chaines de Markoff avec une infinité dénombrable des états possibles. 
Bull. Univ. Etat Moscou, Sér. Int. Sect. A, 1, Fase. 3, 1. 


474 BIBLIOGRAPHY 


Kotmoaororf, A. (1941). Confidence limits for an unknown distribution function. Ann. Math. 
Stats., 12, 461. : 

Kouopzisczyvk, Sr. (1933). Sur l’erreur de la seconde catégorie dans le probléme de M. Student. 
Comptes rendus, 197, 814. 

Kotopzinczyk, St. (1935). On an important class of statistical hypothesis. Biom., 27, 161. 

Konpo, T. (1929). On the standard error of the mean square contingency. Biom., 21, 376. 

Konpo, T. (1930). A theory of the sampling distribution of standard deviations. Biom., 22, 36. 

Konés, A. A. (1939). The problem of the true index number of the cost of living. Hconometrika, 


Tal): 
Koorman, B. O. (1936). On distributions admitting a sufficient statistic. Trans. Am. Math. 
Soc., 39, 399. 


Koopmans, T. (1937). Linear regression analysis of economic time series. Neth. Econ. Inst., 
No. 20. Haarlem. 

Koopmans, T. (1940). The degree of damping in business cycles. Hconometrika, 8, 79. 

Koopmans, T. (1941). Distributed lags in dynamic economics. Hconometrika, 9, 128. 

Kooemans, T. (1942). Serial correlation and quadratic forms in normal variables. Ann. Math. 
Stats., 13, 14. 

Kosuat, R. 8S. (1933). Application of the method of maximum likelihood in the improvement 
of curves fitted by the method of moments. J.R.S.S., 96, 303. 

Kosa, R. 8S. (1935). Application of the method of maximum likelihood to the derivation of 
efficient statistics for fitting frequency curves. J.R.S.S., 98, 128. 

Kosat, R. 8. (1939). Maximal likelihood and minimal y? in relation to frequency curves. Ann. 
Eug. Lond., 9, 209. 
Kozaxkirwicz, M. W. (1937, 1938). Sur les conditions nécéssaires et suffisantes pour la con- 
vergence stochastique. Comptes rendus, 205, 1028 and Fund. Math., 31, 160. 
Kutipack, 8. (1934). An application of characteristic functions to the distribution problem of 
statistics. Ann. Math. Stats., 5, 264. 

KutxiBack, 8. (1935a). On samples from a multivariate normal population. Ann. Math. Stats., 
6, 202. 

Kutipack, 8. (19355). On the Bernouilli distribution. Bull. Am. Math. Soc., 41, 857. 

Kuuipack, §. (1935c). A note on the distribution of a certain partial belonging coefficient. 
Metron, 12, No. 3, 65. 
KutLiBack, 8. (1936a). The distribution laws of the difference and quotient of variables in- 
dependently distributed in Pearson Type III laws. Ann. Math. Stats., 7, 51. 
Kutxpacg, 8. (19366). On certain distribution theorems of statistics. Bull. Am. Math. Soc., 
42, 407. 

KULuBack, 8. (1936c). A note on the multiple correlation coefficient. Metron, 12, No. 4, 67. 

Kuvusack, 8. (1937). On certain distributions derived from the multinomial distribution. Ann, 
Math, Stats., 8, 127. 

Kunetz, G. (1936). Sur quelques propriétés des fonctions caractéristiques. Comptes rendus, 
202, 1829. 

Kuzmin, R. O. (1939). Sur la loi de distribution du coefficient de corrélation dans les tirages 
d’un ensemble normal. C.R. Acad. Sci. U.S.S.R., 22, 298. 

Kuznets, 8. (1929). Random events and cyclical fluctuations. J. Am. Stat. Ass., 24, 258. 

Kuznets, 8. (1933). Seasonal Patterns in Industry and Trade. New York. 


LapErmMAN, J. (1939). The distribution of ‘Student’s’ ratio for sampling of two items drawn 
from non-normal universes. Ann. Math. Stats., 10, 376. ; 

LapERMAN, J., and Lowan, A. N. (1939). On the distribution of nth tabular differences. Ann. 
Math. Stats., 10, 360. 

LanpauL, H. D. (1938). Centroid orthogonal transformations. Psychometrika, 3, 219. 

Lapiace, P. §., Marquis pe (1818). Théorie analytique des probabilités. 


BIBLIOGRAPHY 475 


Larmor, Sir J., and Yamaca, N. (1917). On permanent periodicity in sunspots. Proc. Roy. 
Soc., A, 93, 493. 

LAsxa, V. (1935). Contribution 4 Ja standardisation des définitions des principales notions 
statistiques. Rev. Stat. T'chéchoslovaque, 16, 3. 

Lawuey, D. N. (1938). A generalisation of Fisher’s z-test. Biom., 30,180; and Correction, ibid., 
30, 467. 

LE CorpFiLuer, P. (1933). Les systémes auto-entretcnus et les oscillations de rélaxation. Econo- 
metrika, 1, 328. 

LEeprErRMaAnn, W. (1938). The orthogonal transformation of a factorial matrix into itself. Psycho- 
metrika, 3, 181. 

LEDERMANN, W. (1939). Sampling distribution and selection in a normal population. Biom., 
30, 295. 

Lreumann, A. (1939). Uber die Inversion des Gausschen Wahrscheinlichkeits-Integrals. Mitt. 
Verein. Schweiz. Versicherungs-Math., 38, 15. 

Lreneyet B. A. (1939). On testing the hypothesis that two samples have been drawn from a 
common normal population. Ann. Math. Stats., 10, 365. 

Lz Roux, J. M. (1931). A study of the distribution of variance in small samples. Biom., 
23, 134. 

Laser, C. E. V. (1942). Inequalities for multivariate frequency-distributions. Biom. 32, 284. 

Livy, P. (1925). Calcul des Probabilités. Gauthier-Villars, Paris. 

Livy, P. (193la). Quelques théorémes sur les probabilités dénombrables. Comptes rendus, 192, 
658. 

Livy, P. (19316). Sulla legge forte dei grandi numeri. Gorn. Ist. Ital. Alt., 4, 1. 

Lévy, P. (1931c). Sur un théoréme de M. Khintchine. Bull. Sci. Math., (2), 55, 145. 

Livy, P. (1934). Sur les intégrales dont les éléments sont des variables aléatoires indépendantes. 
Annalhi R. Sct. Norm. Sup. Pisa, (2), 3, 337. 

Lévy, P. (1935a). Sull’ applicazione della geometria dello spazio di Hilbert allo studio delle 
successioni di variabili casuali. Giorn. Inst. Ital. Att., 6, 13. 

Livy, P. (19350). Propriétés asymptotiques des sommes de variables aléatoires indépendantes 
ou enchainées. J. Math. Pur. App., (7), 14, 347. 

Livy, P. (19362). Sur quelques points de la théorie des probabilités dénombrables. Ann. Inst. 
H. Poincaré, 6, 153. 

Livy, P. (1936b). Détermination générale des lois limites. Comptes rendus, 203, 698. 

Lavy, P. (1936c). La loi forte des grands nombres pour les variables aléatoires enchainées. 
J. Math. Pur. App., 15, 11. 

Livy, P. (1937a). L’arithmétique des lois de probabilité. Comptes rendus, 204, 80 and 944 ; Sur 
les exponentielles des polynémes et sur l’arithmétique des produits de lois de Poisson. 
Ann. V Ecole Norm. Sup., 54, 231; and: Nouvelle contribution 4 l’arithmétique des 
produits de lois de Poisson. Comptes rendus, 205, 535. 

Livy, P. (1938). L’arithmétique des lois de probabilité. J. Math. Pur. App., 27, 17. 

Livy, P. (19385, 1939). Sur la définition des lois de probabilités par leurs projections. Comptes 
rendus, 206, 1240, and: Rectification. Ibid., 206, 1699. Also: Sur les projections 
d’une loi de probabilité 4 n variables. Bull. Sci. Math., (2), 63, 148. 

Livy, P. (19392). L’addition des variables aléatoires définies sur une circonféerence. Bull. Soc. 
Math. France, 67, 1. 

Lfivy, P. (19395). Extensions stochastiques des notions de série, d’intégrale et d’aire. Comptes 
rendus, 209, 591. 

Livy, P. (1939c). Sur la division d’un segment par les points choisis au hasard. Comptes rendus, 
208, 147. 

Lewis, W. T. (1935). A reconsideration of Sheppard’s corrections. Ann. Math. Stats., 6, 11. 

Lexis, W. (1903). Abhandlungen zur Theorie der Bevélkerungs- und Moralstatistik. Fischer, 


Jena. 


476 BIBLIOGRAPHY 


Liapounorr, A. (1900). Sur une proposition de la théorie des probabilités. Bull. Acad. Sct. 
St. Pét., (5), 13, 359. 

Liapounorr, A. (1901). Nouvelle forme du théoréme sur la limite de probabilité. Mem. Acad. 
Sci. St. Pét., (8), 12, No. 5. 

Lipstonr, G. T. (1933). Notes on orthogonal polynomials, etc. J. Inst. Act., 64, 128. 

Lipstone, G. T. (1937). Notes on interpolation, etc. J. Inst. Act., 68, 267. 

LinpBLaD, T. (1937). Zur Theorie der Korrelation bei mehr-dimensionalen zufalligen Variablen. 
Acta Soc. Sct. Fennicae, (2), 10, 1. 

LinDEBERG, J. W. (1922). Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlich- 
keitsrechnung. Math. Zeit., 15, 211. 

Lomnicgt, A. (1923). Nouveaux fondements du calcul des probabilités. Fund. Math., 4, 34. 

Lonsetu, A. T. (1942). Systems of linear equations with coefficients subject to error. Ann. 
Math. Stats., 13, 332. 

Lorenz, P. (1931). Die Trend. Vierteljahrshefte zur Konjunkturforschung. 2° Auflage, Sonderhaft, 
21. 

Lorenz, P. (1935). Annual survey of statistical technique: Trends and seasonal variations. 
Econometrika, 3, 456. 

Lorka, A. J. (1938). Some recent results in population analysis. J. Am. Slat. Ass., 33, 164. 

Lorca, A. J. (1939). On an integral equation in population analysis. Ann. Math. Stats., 
10, 144. 

Liprrs, R. (1934). Die Statistik der seltenen Ereignisse. Biom. 26, 108. 

Luxomsk1, J. (1939). On some properties of multidimensional distributions. Ann. Math. Stats., 
10, 236. 

Lurauin, C. (1937). Sur la loi de Bernouilli 4 deux variables. Bull. Classe Sci. Acad. R. Belgique, 
(5), 23, 857. 


Macavtay, F. R. (1931). Smoothing of Time-Series. National Bureau of Economic Research, 
New York. 

MacMaunoy, P. A. (1915, 1917). Combinatory Analysis. Cambridge University Press. 

MacStewart, W. (1941). A note on the power of the sign test. Ann. Math. Stats., 12, 236. 

Mapvow, W. G. (1937). Contributions to the theory of comparative statistical analysis. 
I. Fundamental theorems of comparative analysis.. Ann. Math. Stats., 8, 159. 

Mapvow, W. G. (1938). Contributions to the theory of multivariate statistical analysis. Trans. 
Am. Math. Soc., 44, 454. 

Mapvow, W. G. (1939). Generalisation of the Laplace-Liapounoff theorem. Ann. Math. Stats., 
10, 84. 

Mavow, W. G. (1940). Limiting distributions of quadratic and bilinear forms. Ann. Math. 
Sian, UA 1AGy, 

Manavanosis, P. C, (1922). On errors of observation and upper air relationships. Mem. Ind. 


Met. Dep., 24. 
MAHALANOBIS, P. C. (1930). On tests and measures of group divergence. I. J. Asiat. Soc. Beng., 
26, 541. 


Manaranosis, P. C. (1933). Tables of L-tests. Sankhyd, 1, 109. 

Manaranosis, P. C., Bosn, 8. 8., Roy, P. R., and Banzrus, S. K. (1934). Tables of random 
samples from a normal population. Sankhyd, 1, 289. 

MaHALANnosis, P. C, (1936a). On the generalised distance in statistics. Proc. Nat. Inst. Sci. Ind., 
12, 49. 

Manaranosis, P. C., Bosz, R. C., and Roy, S. N. (19366). Normalisation of statistical variates 
and the use of rectangular coordinates in the theory of sampling distributions. Sankhya, 
3, 1. 

Manaranosis, P. C. (1943). An inquiry into the prevalence of drinking tea among middle-class 
Indian families in Calcutta. Sankhyd, 6, 283. 


BIBLIOGRAPHY 477 


MaAutmann, H. (1935). Kin Beitrag zu Untersuchungen tiber zweidimensionale Verteilungen von 
Massenpunkten bei zufallsartig bedingten Bewegungen. Biom., 27, 191. 

Mattock, R. R. M. (1933). An electrical calculating machine. Proc. Roy. Soc., A, 140, 457. 

Mann, H. B., and Watp, A. (1942). On the choice of the number of intervals in the application 
of the y?-test. Ann. Math. Stats., 13, 306. 

Many, H. B. (1943). On the construction of sets of orthogonal Latin squares. Ann. Math. Stats., 
14, 401. 

Marse, K. (1934). Grundfragen der angewandten Wahrscheinlichkeitsrechnung und theoretischer 
Statistik. Mtnchen and Berlin. 

Marcu, L. (1926). L’analyse de la variabilité. Metron, 6, No. 2, 3. 

Marcuanpb, E. (1937). Probabilités expérimentales, probabilités corrigées et probabilités in- 

dépendantes. Mitt. Verein. Schweiz. Versich-Math., 33, 49. 

Marcinxrewicz, J.. and Zyamunp, A. (1937). Sur les fonctions indépendantes. Fund. Math., 

29, 60. 

Marcrykiewioz, J. (1939). Sur le probléme des moments. Comptes rendus, 208, 405. 

Marxorr, A. A. (1912). Wahrscheinlichkeitsrechnung. Teubner, Leipzig. 

Marpuzs, P. M. (1932). Linear difference equations. J. Inst. Act., 63, 404. 

Marsecuerra, V. (1936). Considerazioni sulla cosidetta legge sinusoidale nel calcolo della pro- 
babilita. Giorn. Inst. Ital. Att.,’7, 206. 

Martin, E. 8. (1934). On the correction for the moment coefficients of frequency-distributions 
when the start of the frequency is one of the characteristics to be determined. Biom., 
26, 12. 

Martin, E. S. (1936). A study of an Egyptian series of mandibles with special reference to mathe- 

matical methods of sexing. Biom. 28, 149. 

Matuer, K. (1935). The combination of data. Ann. Eug. Lond., 6, 399. 

Matuisen, H. C. (1943). A method of testing the hypothesis that two samples are from the 

same population. Ann. Math. Stats., 14, 188. 

Matuszewsst, T., Neyman J., and Surryska, J. (1935). Statistical studies in questions of bacterio- 

logy. Part I. The accuracy of the dilution method. Supp. J.R.SS., 2, 63. 

Mazzont, P. (1934). Su un’ origine geometrica di tipi di distribuzioni di frequenze. Giorn. Ist. 
lial. Ait., 5, 219. 

McCartuay, M. D. (1939). On the application of the z-test to randomised blocks. Ann. Math. 
Stats., 10, 337. 

McCrea, W. H. (1936). A problem on random paths. Math. Gaz., 20, 311. 

McKay, A. T. (1931). Distribution of the estimated coefficient of variation. J.R.S.S., 94, 564. 

McKay, A. T. (1932). A Bessel function distribution. Biom., 24, 39. 

McKay, A. T., Frecuer, E. C., and Pearson, E. 8. (1932). Distribution of the coefficient of 
variation and extended f-distribution. J.R.S.S., 95, 695. 

McKay, A. T. (1933). The distribution of 1/8, in samples of four from a normal universe. Biom., 
25, 204; and: The distribution of 8, in samples of four from a normal universe. Biom., 
25, 411. 

McKay, A. T., and Pzarsoy, E. 8. (1933). A note on the distribution of range in samples of n. 
Biom., 25, 415. 

McKay, A. T. (1934). Sampling from batches. Supp. J.R.S.S., 1, 207. 

McKay, A. T. (1935). The distribution of the difference between the extreme observation and 
the sample mean in samples of n from a normal universe. Biom., 27, 466. 

McKinsey, J. C. C. (1939). A note on Reichenbach’s axioms for probability implication. Bull. 
Am. Math. Soc., 45, 799. 

McMutten, L. (1936). The standard deviation of a difference. Ann. Hug. Lond., 7, 105. 

Meisener, J. (1938). Erzeugende Funktionen der Charlierschen Polynome. Math. Zeit., 44, 531. 

MenpersHavsen, H. (19372). An example of meaningful curvilinear regression in economic time- 
series. Hconometrika, 5, 329. 


478 BIBLIOGRAPHY 


MENDERSHAUSEN, H. (19370). Annual survey of statistical technique: methods of computing 
and eliminating changing seasonal fluctuations. Hconometrika, 5, 234. : 

MENDERSHAUSEN, H. (1939). Clearing variates in confluence analysis. J. Am. Stat. Ass., 
34, 93. 

Menace, W. O. (1937). A statistical treatment of actuarial functions. The Record, 26, 65. 

Merriu, W. W. (1937). Sampling theory in item analyses. Psychometrika, 2, 215. 

Merrineton, M. (1941). Numerical approximations to the percentage points of the y?-distribu- 
tion. Biom., 32, 200. 

Merrineton, M. (1942). Tables of the percentage points of the ¢-distribution. Biom., 32, 300. 

Merrincton, M., and Taomeson, C. M. (1943). Tables of the percentage points of the inverted 
beta (f') distribution. Biom., 33, 73. 

Merzratn, EB. (1933). Anpassung von Flachen an zwei-dimensionale Kollektivgegenstinde und 
ibre Auswerkung fiir die Korrelationstheorie. Metron, 11, No. 2, 103. 

Messina, L. (1933). Un teorema sulla legge uniforme dei grandi numeri. Giorn. Ist. Ital. Att., 
4, 116. 

Mraoc, G. (1934). Sur les chaines multiples discontinues. Comptes rendus, 198, 2135. 

Minoc, G. (1935). Sur la détermination de l’intervalle de contraction de la formula de la moyenne. 
Comptes rendus, 200, 1654. 

Miter, J. C. P. (1934). On a special case in the determination of probable errors. Month. Not. 
R. Astr. Soc., 94, 860. 

MrrcHEt, W. C. (1913). Business Cycles. Univ. of California Press, Berkeley. 

MiTcHELL, W. C., and Burns, A. F. (1935). The National Bureau’s Measures of Cyclical Behaviour. 
Bull. 57, National Bureau of Economic Research. 

Morssetev, N. (1937). Uber Stabilitaétswahrscheinlichkeitsrechnung. Math. Zeit., 42, 513. 

Motina, E. C. (1931). Bayes’ theorem. Ann. Math. Stats., 2, 23. 

Mottna, E. C. (1942). Tables of Poisson’s Exponential Limit. Van Nostrand Co., Inc., New York. 

Moop, A. M. (1939). On the Z, test for many samples. Ann. Math. Stats., 10, 187. 

Moop, A. M. (1940). The distribution theory of runs. Ann. Math. Stats., 11, 367. 

Moop, A. M. (1943). On the dependence of sampling inspection plans upon population distribu- 

tions. Ann. Math. Stats., 14, 415. 

Moorz, H. L. (1914). Economic Cycles: their law and cause. Macmillan, New York. 

Moors, H. L. (1923). Generating Economic Cycles. Macmillan, New York. 

Moors, T. V. (1937). Reduction of data showing non-linear regression for correlation by the 
ordinary product-moment formula, and the measurement of error due to linear regression. 
J. Educ. Psych., 28, 205. 

Morant, G. M. (1921). On random occurrences in space and time when followed by a closed 
interval. Biom. 13, 309. 

Morant, G. M. (1939). The use of statistical methods in the investigation of problems of classifica- 
tion in anthropology. I. The general nature of the material and the form of intra-racial 
distributions of metrical characters. Biom. 31, 72. 

Morean, W. A. (1939). A test for the significance of the difference between the two variances 
in a sample from a normal bivariate population. Biom., 31, 13. 

Mortara, G. (1934). Sulle disuguaglianze statistiche. 22” Sessione dell’? Ist. Int. Stat. London. 

Mosak, J. L. (1939). The least-square standard error of the coefficient of elasticity of demand. 
J. Am. Stat. Ass., 34, 353. 

Movtton, k. J. (1938). The periodic function obtained by repeated accumulation of a statistical 
series. Am. Math. Monthly, 45, 583. 

Mouzon, E. D. (1930). Equimodal frequency distributions. Ann. Math. Stats., 1, 137. 

Mvencu, H. (1936). The probability distribution of protection test results. J. Am. Stat. Ass., 
Ola Gad. 

Muencu, H. (1938). Discrete frequency-distributions arising from mixtures of several single 
probability values. J. Am. Stat. Ass., 33, 390. 


BIBLIOGRAPHY 479 


Miuusr, J. H. (1931). On the application of continued fractions to the evaluation of certain 
integrals, with special reference to the incomplete Beta-function. Biom., 22, 284. 
MussELMAN, J. R. (1926). On the linear correlation ratio in the case of certain symmetrical 
frequency-distributions. Biom., 18, 228. 
Myzrs, R. J. (1934). Note on Koshal’s method of improving the parameters of curves by the use 
of maximum likelihood. Ann. Math. Stats., 5, 320. 


Nace, E. (1936). The meaning of probability. J. Am. Stat. Ass., 31, 10. 

Nair, A. N. K. (1942). On the distribution of Student’s ¢ and the correlation coefficient in samples 
from non-normal population. Sankhyd, 5, 393. 

Nam, K. R. (1936). A note on the extension of lagging correlations between two random series. 
J.RSS., 99, 559. 

Name, K. R. (1938a). On Tippett’s random sampling numbers. Sankhyd, 4, 65. 

Narr, K. R. (19380). Ona method of getting confounded arrangements in the general symmetrical 
type of experiment. Sankhyd, 4, 121. 

Narr, K. R. (1940a). The application of covariance technique to field experiments with missing 
or mixed-up yields. Sankhyd, 4, 581. 

Narre, K. R. (1940b). Table of confidence intervals for the median in samples from any continuous 
population. Sankhyd, 4, 551. 

Narre, K. R. (1941). Balanced confounded arrangements for the 5” type of experiment. Sankhya, 
5 ren 

Nate, K. R. (1942). A note on the method of fitting constants for analysis of non-orthogonal 
data arranged in double classification. Sankhya, 5, 317. 

Nate, K. R., and Rao, C. R. (1942). A note on partially balanced incomplete block designs. 
Science and Culture, 7, 568. 

Narr, K. R., and Srivastava, M.. P. (1942). On a simple method of curve-fitting. Sankhya, 
6, 12h 

Narr, K. R. (1943). Certain inequality relationships among the combinatorial parameters of 
incomplete block designs. Sankhyd, 6, 255. 

Narr, K. R., and Banersgze, K. 8. (1943). A note on fitting straight lines if both variables are 
subject to error. Sankhyd, 6, 331. 

Nair, U. S. (1936). The standard error of Gini’s mean difference. Biom., 28, 428. 

Narr, U.S. (1939). The application of the moment function in the study of distribution laws in 
statistics. Biom., 30, 274. 

Narr, U. S. (1941a). Probability statements regarding the ratio of standard deviations and corre- 
lation coefficient in a bivariate normal population. Sankhya, 5, 151. 

Narr, U. S. (19416). A comparison of tests for the significance of the difference between two 
variances. Sankhya, 5, 157. 

Narvumt, S. (1923a). On the general forms of bivariate frequency-distributions which are mathe- 
matically possible when regression and variation are subjected to limiting conditions. 
Part I, Biom., 15, 77, and Part II, Biom., 15, 209. 

Narvumi, 8. (19236). On further inequalities with possible application to problems in the theory 
of probability. Biom., 15, 245. 

Naver, P. P. N. (1936). An investigation into the application of Neyman and Pearson’s L, test, 
with tables of percentage limits. Stat. Res. Mem., 1, 38. 

Newso xp, E. M. (1925). Notes on an experimental test of errors in partial correlation coefficients 
derived from fourfold and biserial total coefficients. Biom., 17, 251. 

Newsotp, E. M. (1927). Practical application of the statistics of repeated events, particularly 
to industrial accidents. J.R.S.S., 90, 487. 

NeEw.anp, W. F., and Nxat, E. E. (1939). Statistical control of the quality of telephone service. 
Supp. J.RSS., 6, 25. 


480 BIBLIOGRAPHY 


Newman, D. (1939). The distribution of range in samples from a normal population, expressed 
in terms of an independent estimate of standard deviation. Biom., 31, 20. 

Neryman, J. (1925). Contributions to the theory of small samples drawn from a finite population. 
Biom., 17, 472. 

Neyman, J. (1926). Further notes on non-linear regression. Biom., 18, 257. 

NeymMan, J., and Pearson, E. S. (1928). On the use and interpretation of certain test criteria for 
purposes of statistical inference. Biom., 20A, 175 and 263. 

NeyMan, J., and Pearson, E. 8. (1931a). Parelior notes on the y?-distribution. Biom., 22, 298. 

Neyman, J.,-and Pearson, E. S. (19316). On the problem of k samples. Bull. Acad. Polonaise 
Sci. Lett. Sertes A, 460. 

NuyMan, J., and Pearson, E. S. (1933a). On the testing of statistical hypotheses in relation to 
probability a priori. Proc. Camb. Phil. Soc., 29, 492. 

Neryman, J. (19335). An outline of the theory and practice of representative method applied in 
social research. Polish Inst. Social Problems. Actuarial Series, No. 1. Warsaw. 

Neyman, J.,and Pearson, E. 8. (1933c). On the problem of the most efficient tests of statistical 
hypotheses. Phil. Trans., A, 231, 289. 

Neyman, J. (1934). On two different aspects of the representative method, etc. J.R.S.S., 
97, 558. 

Neyman, J. (19352). Su un teorema concernente le cosidette statistiche sufficienti. Giorn. Ist. 
Ital. Att., 6, 320. 

Neyman, J. (19350). Sur la vérification des hypothéses statistiques composées. Bull. Soc. Math. 
France, 63, 1. 

Neyman, J., Iwasxrewicz, K., and KotopztErczyk, St., (1935c). Statistical problems in agricul- 
tural experimentation. Supp. J.R.S.S., 2, 107. 

Neyman, J., and Pearson, E. 8. (1936a). Sufficient statistics and uniformly most powerful tests 
of statistical hypotheses. Stat. Res. Mem., 1, 113. 

Neyman, J., and Toxarska, B. (19360). Errors of the second kind in testing ‘ Student’s ’ hypo- 
thesis. J. Am. Stat. Ass, 31, 318. 

Neyman, J., and Pearson, EH. S. (1936, 1938). Contributions to the theory of testing statistical 
hypotheses: I. Unbiassed critical regions of Type A and Type A,. Stat. Res. Mem., 
1,1; IL. Certain theorems on unbiassed critical regions of Type A; III. Unbiassed tests 
of simple statistical hypotheses specifying the values of more than one unknown parameter. 
Ibid., 2, 25. 

Neyman, J. (1937a). ‘Smooth test’ for goodness of fit. Skand. Akt., 20, 149. 

Neryman, J. (19376). Outline of a theory of statistical estimation based on the classical theory 
of probability. Phil. Trans., A, 236, 333. 

Neymay, J. (1938a). Contribution to the theory of sampling human populations. J. Am. Stat. 
Ass., 33, 101. 

Neyman, J. (1938). ‘Tests of statistical hypotheses which are unbiassed in the limit. Ann. Math. 
Stats., 9, 69. 

Neyman, J. (1938c). On statistics the distribution of which is independent of the parameters 
involved in the original probability law of the observed variables. Stat. Res. Mem., 
2, 58. 

Neyman, J., and Pearson, E. 8. (1938d). Note on some points in ‘ Student’s’ paper on ‘ Com- 
parison between balanced and random arrangements of field plots.’ Biom., 29, 380. 

Nuyman, J. (1939a). On a new class of ‘ contagious ’ distributions applicable in entomology and 
bacteriology. Ann. Math. Stats., 10, 35. 

Neyman, J. (1939b). On the hypotheses underlying the applications of statistical methods to 
routine laboratory analysis. Ann. Math. Stats., 10, 87. 

Neymay, J. (1941a). Fiducial argument and the theory of confidence intervals. Biom., 32, 128. 

Neyman, J. (1941). On a statistical problem arising in routine analysis and in sampling ; inspec- 
tions of mass production. Ann. Math. Stats. pee Pree 


BIBLIOGRAPHY 481 


Neyman, J. (1942). Basic ideas and some recent results of the theory of testing statistical hypo- 
theses. J.R.S.S., 105, 292. 

Nicnorson, C. (1941). A geometrical analysis of the frequency-distribution of the ratio between 
two variables. Biom., 32, 16. 

NicHotson, C. (1943). The probability integral for two variables. Biom., 33, 59. 

Norris, N. (1935). Inequalities among averages. Ann. Math. Stats., 6, 27. 

Norris, N. (1937). Convexity properties of generalised mean value functions. Ann. Math. 
Stats., 8, 118. 

Norris, N. (1938). Some efficient measures of relative dispersion. Ann. Math. Stats., 9, 214. 

Norris, N. (1939). The standard errors of the geometric and harmonic means. Ann. Math. 
Stats., 10, 84. 

Norris, N. (1940). The standard errors of the geometric means. Ann. Math. Stats., 11, 445. 

Norton, H. W. (1937). Use of series in an exact test of significance in a discontinuous distribution. 
Ann, Eug. Lond., 7, 349. 

Norton, H. W. (1939). The 7 x 7 squares. Ann. Eug. Lond., 9, 269. 

Norton, K. A. (1938). Limits to the accuracy of estimated moment coefficients. Sankhyd, 
3, 265. 

NypeE tt, 8. (1919). The mean errors of the characteristics in logarithmic normal distributions. 
Skand. Akt., 2, 134. 


Otps, E. G. (1935). Distribution of greatest variates, least variates and intervals of variation 
in samples from a rectangular universe. Bull. Am. Math. Soc., 41, 297. 

Oxps, E. G. (1937). On the remainder in the approximate evaluation of the probability in the 
symmetrical case of James Bernoulli’s theorem. Bull. Am. Math. Soc., 43, 806. 

Oups, E. G. (1938a). A moment-generating function which is useful in solving certain matching 
problems. Bull. Am. Math. Soc., 44, 407. 

Ops, E. G. (1938). Distribution of sums of squares of rank differences for small numbers of 
individuals. Ann. Math. Stats., 9, 133. 

Oups, E. G. (1939). Remarks on two methods of sampling inspection. Ann. Math. Stats., 10, 
87. 

Otps, E. G. (1940).. On a method of sampling. Ann. Math. Stats., 11, 355. 

OusHEN, C. A. (1938). Transformations of the Pearson Type III distribution. Ann. Math. Stats., 
9, 176: 

Onicescu, O., and Mraoc, G. L. (1935-1939). Sur les chaines de variables statistiques. Comptes 
rendus, 200, 511; 202, 2031; L’allure asymptotique de la somme des variables d’une 
chaine de Markoff discontinue. Jbid., 205, 481; Sur les sommes de variables enchainés. 
Bull. Math. Soc. Roum. Sci., 41, 99. 

OrrrnHEm, S. (1909). Uber die Bestimmung der Periode einer periodischer Erscheinung nebst 
Anwendung auf der Theorie des Erdmagnetismus. Wien. Sitzber., 2a, 118. 

O’Tootz, A. L. (1931, 1932). On symmetric functions and symmetric functions of symmetric 
functions. Ann. Math. Stats., 2, 101 and: (Multivariate case), ibid., 3, 56. 
O’Tootr, A. L. (1933). On the system of curves for which the method of moments is the best 

method of fitting. Ann. Math. Stats., 4, 1. 

O’Tootsz, A. L. (1934). On the best values of r in samples of R from a finite population of N. 
Ann. Math. Stats., 5, 146. 

OtresTap, P. (1937). On some discontinuous frequency functions and frequency distributions. 
Skand. Akt., 20, 75. 

OrrestaD, P. (1939). On the use of the factorial moments in the study of discontinuous frequency 
distributions. Skand. Akt., 22, 22. 


Pas-Tsr-Yuan, (1933). On the logarithmic frequency-distribution and the semi-logarithmic 
correlation surface. Ann. Math. Stats., 4, 30. 
A.S.—VOL. I. re: 


482 BIBLIOGRAPHY 


Parrman, E., and Pearson, K. (1919). On the corrections for moment coefficients of limited- 
range frequency-distributions when there are finite or infinite ordinates and any slopes 
at the terminals of the range. Biom., 12, 231. 

Pam, C. (1937). Inhomogeneous telephone traffic in full-availability groups. Ericsson Technics, 
No. 1, Stockholm. 

PanseE, V. G. (1939). Preliminary studies on sampling in field experiments. Sankhya, 4, 139. 

Pavtson, E. (1941). On certain likelihood ratio tests associated with the exponential distribution. 
Ann. Math. Stats., 12, 301. 

Pauuson, E. (1942). An approximate normalisation of the analysis of variance distribution. 
Ann. Math. Stats., 13, 233. 

Prart, R., and Reep, L. J. (1923). On the mathematical theory of population growth. Metron, 
3, No. I, 6. 

Peart, R. (1930). Introduction to Medical Biometry and Statistics. Saunders and Co., Philadelphia 
and London. 

Peary, R., and Mryer, J. R. (1935). On the comparison of groups in respect of a number of 
measured characters. Human Biology, 7, 95. 

Part, R. (1937). On the moment product-sums of frequency-distributions. Human Biology, 
9, 410. 

Pearse, G. E. (1928). On corrections for the moment coefficients of frequency-distributions 
when there are infinite ordinates at one or both terminals of the range. Biom., 
20A, 314. 

Prarson, E. 8. (1923). The probable error of a class-index correlation. Biom., 14, 261. 

Pearson, E. S. (1924). Note on the approximations to the probable error of a coefficient of 
correlation. Biom., 16, 196. 

Pearson, E. 8. (1925). Bayes’ theorem, examined in the light of experimental sampling. Biom., 
17, 388. 

Pearson, E.S. (1926). <A further note on the distribution of range in samples taken from a normal 
population. Biom., 18, 173. 

Pearson, E. 8. (1927). Further note on the linear correlation ratio. Biom., 19, 223. 

Pearson, E. S., and ApyanTuaya, N. K. (1928, 1929). The distribution of frequency constants 
in small samples from non-normal symmetrical and skew populations. Biom.,20A, 356, 
and 21, 259. 

Pearson, E. 8S. (1929). Some notes on sampling tests with two variables. Biom., 21, 337. 

Prarson, EH. S. (1930). A further development of tests for normality. Biom., 22, 239. 

Pearson, KE. 8., and Neyman, J. (1930). On the problem of two samples. Bull. Acad. Polonaise 
Sei. Lett. Series A, 73. 

Pearson, E. 8. (1931a, 1932). The test of significance for the correlation coefficient. J. Am. Stat. 
Ass., 26, 128, and 27, 424. 

Pearson, E. 8. (1931b). The analysis of variance in cases of non-normal variation. Biom., 
23, 114. 

Pearson, KE. 8S. (1932). The percentage limits for the distribution of range in samples from a 
normal population. Biom., 24, 404. ; 

Pearson, li. 8. (1933a). Statistical method in the control and standardisation of the quality of 
manufactured products. J.R.S.S., 96, 21. 

Pearson, E.S., and Winks, S. 8. (19335). Methods of statistical analysis appropriate for k samples 
of two variables. Biom., 25, 353. 

Pearson, E. S. (1934). Sampling problems in industry. Supp. J.R.S.S., 1, 107. 

PEARSON, E.8., and Hatnes, J. (1935a). The use of range in place of standard deviation in small 
samples. Supp. J.R.SS., 2, 83. 

Pearson, E. S., and Suxuatmg, A. V. (19355). An illustration of the use of fiducial limits in 
determining the characteristics of a sampled batch. Sankhyd, 2, 13. 

Pearson, E. 8. (1935c). A comparison of B, and Mr. Geary’s w, criterion. Biom., 27, 333. 


BIBLIOGRAPHY 483 


Pearson, E. 8. and CHANDRA SEKAR, C. (1936). The efficiency of statistical tools and a criterion for 
the rejection of outlying observations. Biom., 28, 308. 

Pearson, E. 8. (1937a). Maximum likelihood and methods of estimation. Biom., 29, 155. 

Prarson, E. 8. (19376, 1938). Some aspects of the problem of randomisation. Biom., 29, 53. 
II. An illustration of ‘ Student’s’ inquiry into the effect of balancing in agricultural 
experiments. Biom., 30, 159. 

Pearson, E. S. (1938). The probability integral transformation for testing goodness of fit and 
combining independent tests of significance. Biom., 30, 134. 

Pearson, E. S. (1939). Note on the inverse and direct methods of estimation in R. D. Gordon’s 
problem. Biom., 31, 181. 

Pearson, E. S. (1941). A note on further properties of statistical tests. Biom., 32, 59. 

Pearson, E. 8. (1942a). Notes on testing statistical hypotheses. Biom., 32, 311. 

Pearson, E. 8., and Harriey, H. O. (19426). The probability integral of the range in samples 
of m observations from a normal population. Biom., 32, 301. 

Pearson, E.S., and Hartzey, H. O. (1943). Tables of the probability integral of the ‘ studentised ’ 
range. Biom., 33, 89. 

Pearson, K. (1894). Contributions to the mathematical theory of evolution. Phil. Trans., A, 
PS5,° 71; 

Pzarson, K. (1895). Contributions to the mathematical theory of evolution. II. Skew variation 
in homogeneous material. Phil. Trans., A, 186, 343. 

Pearson, K. (1896). Mathematical contributions to the theory of evolution. III. Regression, 
heredity and panmixia. Phil. Trans., A, 187, 253. 

Parson, K., and Lez, A. (1897a). On the distribution of frequency (variation and correlation) 
of the barometric heights at diverse stations. Phil. Trans., A, 190, 423. 

Pearson, K. (1897b). Mathematical contributions to the theory of evolution. On a form of 
spurious correlation which may arise when indices are used in the measurement of organs. 
Proc. Roy. Soc., 60, 489. 

Pearson, K., and Finon, L. N. G. (1898). Mathematical contributions to the theory of evolution. 
IV. On the probable errors of frequency constants and on the influence of random selection 
on variation and correlation. Phil. Trans., A, 191, 229. 

Pearson, K., Lez, A., and BramuEy-Moore, L. (1899a). Mathematical contributions to the 
theory of evolution. VI. Genetic (reproduction) selection. Inheritance of fertility in 

man and of fecundity in thoroughbred racehorses. Phil. Trans., A, 192, 257. 

Pearson, K. (18996). On certain properties of the hypergeometrical series and on the fitting of 
such series to observation polygons in the theory of chance. Phil. Mag., (5), 47, 236. 

Prarson, K. (1900a). Mathematical contributions to the theory of evolution. VII. On the 
correlation of characters not quantitatively measurable. Phil. Trans., A, 195, i. 

Prarson, K. (19006). Mathematical contributions to the theory of evolution. VIII. On the 
inheritance of characters not capable of exact quantitative measurement. Phil. T'rans., 
A, 19s 719. 

Pearson, K. (1900c). On a criterion that a given system of deviations from the probable in the 
case of a correlated system of variables is such that it can be reasonably supposed to have 
arisen in random sampling. Phil. Mag., (5), 50, 157. 

Pearson, K., and others (190la). Mathematical contributions to the theory of evolution. 
IX. On the principle of homotyposis, etc. Phil. Trans., A,.197, 285. 

Pearson, K. (19016). Mathematical contributions to the theory of evolution. X. Supplement 
to a memoir on skew variation. Phil. Trans., A, 197, 443. 

Puarson, K. (1901c). On lines and planes of closest fit to systems of points inspace. Phil. Mag., 
6), 2, 559. 

— a (1902a). Mathematical contributions to the theory of evolution. XI. On the 
influence of natural selection on the variability and correlation of organs. Phil. Trans., 
A, 200, 1. 


484 BIBLIOGRAPHY 


Prarson, K. (19026). On the modal value of an organ or character. Biom., 1, 256. 

Prarson, K. (1902c). Note on Francis Galton’s problem. Biom., 1, 390. 

Pearson, K. (1903, 1913, 1920). On the probable errors of frequency constants. Biom., 2, 
273; 9, 1 and 13, 113. 

Pearson, K. (1904). Mathematical contributions to the theory of evolution. XIII. On the 
theory of contingency and its relation to association and normal correlation. Drapers’ 
Co. Res. Mem. Biometric Series I. Cambridge University Press (formerly Dulau and 
Co.). 

Pearson, K. (1905). Mathematical contributions to the theory of evolution. XIV. On the 
general theory of skew correlation and non-linear regression. Drapers’ Co. Res. Mem. 
Biometric Series II. Cambridge University Press. 

Pearson, K., and Buakeman, J. (1906). On the probable error of mean-square contingency. 
Biom., 5,191; (PEARSON alone, 1915). Biom., 10,570; (with A. W. Youne, 1916) On 
the probable error of a coefficient of contingency without approximation. Biom., 11, 215. 

Prarson, K. (19072). Mathematical contributions to the theory of evolution. XVI. On further 
methods of determining correlation. Drapers’ Co. Res. Mem. Biometric Series IV. 
Cambridge University Press. : 

Pearson, K. (19072). On the influence of past experience on future expectation. Phil. Mag., 
(5) 13, a60. 

Pearson, K. and Lzz, A. (1908). On the generalised probable error in multiple normal correlation. 
Biom., 6, 59. 

Parson, K. (1909). On a new method of determining correlation between a measured character 
A and a character B, etc. Biom., 6, 96. 

Puarson, K. (1910). On a new method of determining correlation when one variable is given by 
alternative and the other by multiple categories. Biom., 7, 248. 

Pearson, K. (191la). On the probability that two independent distributions of frequency are 
really samples from the same population. Biom., 8, 250. 

Pearson, K. (19116). On a correction to be made to the correlation ratio 7. Biom., 8, 254. 

Pearson, K. (1912a). Mathematical contributions to the theory of evolution. XVIII. On a 
novel method of regarding the association of two variates classed solely in alternate 
categories. Drapers’ Co. Res. Mem. Biometric Series VII. Cambridge University 
Press. 

Pearson, K. (19126, 1913). On the appearance of multiple cases of disease in the same house. 
Biom., 8, 404, and 9, 28. 

PEarson, K. (1913a). On the probable error of a coefficient of correlation as found from a fourfold 
table. Biom., 9, 22. 

Pearson, K. (1913b). On the measurement of the influence of ‘ broad categories ’ on correlation. 
Biom., 9, 116. 

Pearson, K. and Heron, D. (1913c). On theories of association. Biom., 9, 159. 

Pearson, K. (1913d). Note on the surface of constant association. Biom., 9, 534. 

Pearson, K., editor, (1914, 1931). Tables for Statisticians and Biometricians, Part I (1914, 3rd edn. 
1930) and Part II (1931). Cambridge University Press. 

Pearson, K., and Cavs, B. M. (1914). Numerical illustrations of the variate-difference correlation 
method. Biom., 10, 340. 

Pearson, K. (1914, 1921). On an extension of the method of correlation by grades or ranks. 
Biom., 10, 416; and Second note, Biom. 13, 302. 

Parson, K. (19152). On the partial correlation ratio. Proc. Roy. Soc., A, 91, 492. 

Prarson, K. (19156). On certain types of compound frequency-distributions in which the com- 
ponents can be individually described by binomial series. Biom., 11, 139. 

Pearson, K. (19162). Mathematical contributions to the theory of evolution. XIX. Second 
supplement to a memoir on skew variation. Phil. Trans., A, 216,429. (Correction, Biom., 
12, 259.) 


BIBLIOGRAPHY 485 


Pearson, K. (19160). On the general theory of multiple contingency with special reference to 
partial contingency. Biom., 11, 145. 

Pearson, K., and Tocumr, J. (1916c). On criteria for the existence of differential death-rates. 
Biom., 11, 145. 

Pearson, K. (1916d). On some novel properties of partial and multiple correlation coefficients in 
a universe of manifold characteristics. Biom., 11, 231. 

Pearson, K. (1916e). On the application of ‘ goodness of fit’ tables to test regression curves 
(and theoretical curves) used to describe observational or experimental data. Biom., 11, 
239. (Correction, Biom., 12, 259.) 

Pearson, K. (1916f). On a brief proof of the fundamental formula for testing the goodness of 
fit of frequency-distributions and on the probable error of P. Phil. Mag., (6), 31, 369. 

Pearson, K. (1917). On the probable error of biserial 7. Biom., 11, 292. 

Pearson, K. and Youne, A. W. (1918). On the product-moments of various orders of the normal 
correlation surface of two variates. Biom., 12, 86. 

PEARSON, K., and Parrman, EK. (1919). On corrections for the moment-coefficients of limited 
range frequency-distributions when there are finite or infinite ordinates and any slopes 
at the terminals of the range. Biom., 12, 231. 

Pearson, K. (1919). On generalised Tchebycheff theorems in the mathematical theory of 
statistics. Biom., 12, 284. 

Pearson, K. (1920a). The fundamental problem of practical statistics. Biom., 13, 1. 

Pearson, K. (19206). Notes on the history of correlation. Biom., 13, 25, 

Pearson, K. (1920c). On the Construction of Tables and on Interpolation. Part I. Univariate 
Tables. Part II. Bivariate Tables. Tracts for Computers, Nos. 2 and 3. Cambridge 
University Press. 

Pgarson, K. (1921). Ona general method of determining the successive terms in a skew regression 
line. Biom., 13, 296. 

PEaRsoN, K. (1922a, 1923). On the y? test of goodness of fit. Biom., 14, 186; and: Further 
note, Biom., 14, 418. 

Pearson, K., and Prearson, E. 8. (19226). On polychoric coefficients of correlation. Biom., 
14, 127. 

Pearson, K., and ELtprrton, E. M. (1923a). On the variate-difference method. Biom., 
14, 281. 

Pearson, K. (19236). On the correction necessary for the correlation ratio 7. Biom., 14, 412. 

Pearson, K. (1923c). Notes on skew frequency surfaces. Biom., 15, 222; and: On non-skew 
frequency surfaces. Biom., 15, 231. 

Pearson, K. (1924a). Note on Professor Romanovsky’s generalisation of my frequency curves. 
Biom., 16, 116. 

Pearson, K. (19246). On the moments of the hypergeometrical series. Biom., 16, 157. 

Pearson, K. (1924c) On a certain double hypergeometrical series and its representation by 
continuous frequency surfaces. Biom., 16, 172. - 

Pearson, K. (1924d). On the mean error of frequency-distributions. Biom., 16, 198. 

Pearson, K. (1924e). Historical note on the origin of the normal curve. Biom., 16, 402. 

Pearson, K. (1925a). The fifteen-constant bivariate frequency surface. Biom., 17, 268. 

Prarson, K. (19255). On first-power methods of finding correlation. Biom., 17, 459. 

Pearson, K. (1926a). Researches on the mode of distribution of the constants of samples taken 
at random from a bivariate normal population. Proc. Roy. Soc., A, 112, 1. 

Pearson, K. (19260). On the coefficient of racial likeness. Biom., 18, 105. 

Pearson, K., Jerrery, G.B., and ELpErton, E. M. (1929). On the distribution of the first pro- 
duct-moment coefficient in samples drawn from an indefinitely large normal population. 
Biom., 21, 164. 

Parson, K. (193la). On the nature of the relationship between two of ‘Student’s’ variates 
(z, and z,) when samples are taken from a bivariate normal population. Biom., 22, 405 ; 


486 BIBLIOGRAPHY 


Some properties of ‘Student’s’ z. Biom., 23,1; and: Murthee remarks on the z-test. 
Biom., 23, 408. 

Pearson, K. (19315). Appendix to a paper by Professor Tokishige Hoe On the standard error 
of the median to a third approximation, etc. Biom., 23, 361. 

Pearson, K., and Pearson, M. V. (1931c, 1932). On the mean Chanter and variance of a ranked 
individual, and on the mean and variance of the intervals between ranked individuals. 
Part I. Symmetrical Distributions (Normal and Rectangular). Biom., 23,364. Part II. 
Case of certain skew curves. Biom., 24, 203. 

Pearson, K. (1931d). Historical note on the distribution of the standard deviation of samples 
of any size drawn from an indefinitely large normal parent population. Biom., 23, 416. 

Pearson, K., Srourrer, 8. A., and Davin, F. N. (1932a). Further applications in statistics of the 
T(x) Bessel function. Biom., 24, 293. 

Pearson, K. (1932). Experimental discussion of the (v?, P) test for goodness of fit. Biom., 24, 351. 

Pearson, K. (1933a). On the applications of the double Bessel function K,,, (x) to statistical 
problems. Part I. Theoretical. Biom., 25, 158. 

Pearson, K. (19336). Ona method of determining whether a sample of given size n supposed to 
have been drawn from a parent population having a known probability integral has 
probably been drawn at random. Biom., 25, 379. 

Pearson, K. (1934). On a new method of determining ‘ goodness of fit’. Biom., 26, 425. 

Pearson, K. (1935). On the corrections for broad categories, being a note on Mr. Wisniewski’s 
memoir. Biom., 27, 364. 

Pearson, K. (1936). Method of moments and method of maximum likelihood. Biom., 28, 34. 

Peek, R. L. (1937). Test of an observed difference in the frequency of two results. J. Am. Stat. 
TUK, GP Iy BBY 

PEIERLS, R. 8. (1935). Statistical error in counting experiments. Proc. Roy. Soc., A, 149, 467. 

Priszr, M. A. (1943). Asymptotic formule for significance levels of certain distributions. Ann. 
Math. Stats., 14, 56. 

Perper, J. (1929). Studies in the theory of sampling. Biom., 21, 231. 

Pepper, J. (1932). The sampling distribution of the third moment coefficient—an experiment. 
Biom., 24, 55. 

Perro, V. (1933). On the distribution of ‘Student’s’ ratio for samples of three drawn from a 
rectangular population. Biom. 25, 203. 

Persons, W. M. (1928). The construction of index numbers. Houghton Mifflin, Cambridge, Mass. 

Pietra, G. (1925). The theory of statistical relations, with special reference to cyclical series. 
Metron, 4, No. 3-4, 383. 

Pretra, G. (1932a). Nuovi contributi alla metodologia degli indici di variabilita e di concentrazione. 
Att. R. Ist. Veneto di Sci., 989. 

Pietra, G. (19326). Dell’ interpolazione parabolica nel caso in cui entrambi i valori delle variabili 
sono affetti da errori accidentali. Metron, 9, Nos. 3-4, 77. 

Pietra, G. (1934). Statistica. (2 vols.) Giuffré, Milan. 

Pirman, E. J. G. (1936). Sufficient statistics and intrinsic accuracy. Proc. Camb. Phil. Soc., 
32, 567. 

Pirman, E. J. G. (1937a, 1938). Significance tests which may be applied to samples from any 
population. Supp. J.RSS., 4, 119; IL. The correlation coefiicient test. Supp. J.R.S.S. 
4, 225; IIL. The analysis of variance test. Biom., 29, 322. 

Pirman, E. J. G. (19376). The ‘ closest ’ estimates of statistical parameters. Proc. Camb. Phil. 
SOC OS, Ziloe 

Pirmayn, E. J. G. (19392). The estimation of location and scale parameters of a continuous popula- 
tion of any given form. Biom., 30, 391. 

Pitman, EH. J. G. (19396). Tests of iy pote concerning location and scale parameters. Biom., 
31, 200. 

Pitman, E. J. G. (1939c).. A note on normal correlation. Biom., 31, 9. 


BIBLIOGRAPHY : 487 


Przzerti, E. (1939). Osservazioni sulle medie esponenziali e baso-esponenziali. Metron, 13, 
IN@, 44, ak 

Porsson, 8. D. (1837). Recherches sur la probabilité des jugements, etc. Paris. 

Potuak, L. W. (1926). Rechentafeln zur harmonischen Analysis. Barth, Leipzig. 

Pottak, L. W. (1927). Periodogramme hochfrequenten Schwankungen meteorologischer 
Elemente. Met. Zeit., 4, 121. 

PotiaK, L. W., and Katser, F. (1935). Méthode numérique de J. Fuhrich pour le calcul des 
périedicités, sa mise a l’épreuve et son application aux mouvements polaires. Rév. Stat. 
T'chéchoslovaque, 16, 13. 

PotiarD, H. 8. (1934). On the relative stability of the median and arithmetic mean, with parti- 
cular reference to certain frequency-distributions which can be dissected into normal 
distributions. Ann. Math. Stats., 5, 227. 

Pétya, G. (1920). Uber den zentralen Grenzwertsatz der Wahrscheinlichkeitsrechnung und das 
Momentproblem. Math. Zeit., 8, 173. 

Poéxrya, G. (1923). Herleitung des Gauss’schen Gesetzes aus einer Funktionalgleichung. Math. 


Zeit., 18, 96. 
Pétya, G. (1931). Sur quelques points de la théorie des probabilités. Ann. Inst. H. Poincaré, 
Lea eg 


Poétya, G. (1937). Zur Kinematik der Geschiebebewegung. Mitt. Versuchs. Wasserbau an der 
Eid. Tech. Hochschule. Zurich. 

Péxya, G. (1938a). Sur ?indétermination d’un probléme voisin du probléme des moments. Comptes 
rendus, 207, 708. 

Pétya, G. (19385). Sur la promenade au hasard dans un réseau de rues. Actualités Scientifiques 
et Industrielles, No. 734, Paris. Hermann et Cie. 

Powe tt, R. W. (1930). Successive integration as a method for finding long-period cycles. Ann. 
Math. Stats., 1, 123. 

Pretorius, 8. J. (1930). Skew bivariate frequency surfaces examined in the light of numerical 
illustrations. Biom., 22, 109. 

Proxopovic, 8. N. (1935). La corrélation des séries quantitatives. Rev. Stat. T'chéchoslovaque, 
16, 64. 

Przysorowskl, J. and Wivfnsxi, H. (1935a). Sur les erreurs de la premiére et de la seconde 
catégorie dans la vérification des hypothéses concernant la loi de Poisson. Comptes 
rendus, 200, 1460. 

PrzyBorRowskI, J., and WiniNnskI, H. (1935b). Statistical principles of routine work in testing 
clover seed for dodder. Biom., 27, 273. 

PrzyBoRowskEI, J., and WiLiNsKI, H. (1936). Note on the application of a theorem of Frau 
Pollaczek-Geiringer. Biom., 28, 187. 

PrzyBorowskI, J., and Witénsxi, H. (1940). Homogeneity of results in testing samples from 
Poisson series, etc. Biom., 31, 313. 


QUENSEL, C. E. (1936). A method of determining the regression curve when the marginal distribu- 
tion is of the normal logarithmic type. Ann. Math. Stats., 7, 196. 

QUENSEL, ©. E. (1938). The distributions of the second moment and of the correlation coefficient 
in samples from populations of Type A. Lunds Univ. Arsskr., N.F. 34, 4, 1. 


Rarov, D. (1938). On the decomposition of Gauss and Poisson laws. Bull. Acad. Sci. U.S.S.R. 
Sér. Math., 1, 91. 

Reep, L. J. (1922). Fitting straight lines. Metron, 1, No. 3, 54. 

Ruean, F. (1936, 1938). The application of the theory of admissible numbers to time-series with 
constant probability. Trans. Am. Math. Soc., 36, 511; and: The application of the 
theory of admissible numbers to time-series with variable probability. Am. J. Maths., 


Oo, 007. 


488 BIBLIOGRAPHY 


REICHENBACH, H. (1935). Wahrscheinlichkeitslehre, Leiden. 

RercHensacu, H. (1937). Les fondements logiques du calcul des probabilités. Ann. Inst. H. 
Poincaré, 7, 267. ; 

REIERSOL, O. (1940). A method for recurrent computation of all the principal minors of a deter- 
minant and its application in confluence analysis. Ann. Math. Stats., 11, 193. 

ReEIERSOL, O. (1941). Confluence analysis by means of lag moments and other methods of con- 
fluence analysis. Econometrika, 9, 1. 

Ruopes, E. C. (1921). Smoothing. Tracts for Computers, No. 6. Cambridge University Press. 

Raopes, E. C. (1923, 1925). On a certain skew correlation surface. Biom., 14, 355, and 17, 
314. 

Ruopes, E. C. (1924, 1925). On the problem whether two given samples can be suppose] to have 
been drawn from the same population. Brom., 16, 239 and Metron, 5, 3. 

Ruopes, EB. C. (1925). On sampling. Metron, 5, Nos. 2-3, 3. 

Ruopes, E. C. (1927). The precision of means and standard deviations when the individual errors 
are correlated. J.R.S.S., 90, 135. 

Ruopes, E. C, (1928). On the normal correlation function as an approximation to the distribution 
of paired drawings. J.R.S.S., 91, 548. 

Ruopes, E. C. (1930). On the fitting of parabolic curves to statistical data. J.R.S.S., 93, 569. 

Ruopvzs, E. C. (1936). The precision of index numbers. J.R#.S.S., 99, 142. 

Ricsg, 8. O. (1938). Van Uven’s theorem in probability theory and a self-reciprocal Hankel trans- 
form. Quart. J. Maths, 9, 1. 
Ricg, 8. O. (1939). The distribution of the maxima of a random curve. Am. J. Maths., 61, 409. 
Ricuarps, H. I. (1931). Analysis of the spurious effect of high intercorrelation of independent 
variables on regression and correlation coefficients. J. Am. Stat. Ass., 26, 21. 
Ricker, W. E. (1937). The concept of confidence or fiducial limits applied to the Poisson frequency. 
J. Am. Stat. Ass., 32, 349. 

River, P. R. (1929). On the distribution of the ratio of mean to standard deviation in small 
samples from non-normal populations. Biom., 21, 124. 

Riper, P. R. (193la). On small samples from certain non-normal universes. Ann. Bath. Stats., 
2, 48. 

Riper, P. R. (19316). A note on small sample theory. J. Am. Stat. Ass., 26, 172. ° 

Riper, P. R. (1932). On the distribution of the correlation coefficient in small samples. Biom., 
24, 382. 

Riper, P. R. (1933). Criteria for rejection of observations. Washington University Studies, 
New Series, Science and Technology, No. 8. 

Riper, P. R. (1934). The third and fourth moments of the generalised Lexis theory. Metron, 
12, No. 1, 185. 

Riper, P. R. (1936). Annual survey of statistical technique: Developments in the analysis of 
multivariate data. Hconometrika, 4, 264. 

Rierz, H. L., editor (1924). Handbook of Mathematical Statistics. Houghton Mifflin, Boston. 

Rigtz, H. L. (193la). Note on the distribution of the standard deviation of sets of three variates 
drawn at random from a rectangular distribution. Biom., 23, 424. 

Rretz, H. L. (19316). On certain properties of frequency-distributions obtained by a linear 
fractional transformation of the variates of a given distribution. Ann. Math. Stats., 
2, 38. 

Rretz, H. L. (1932). A simple non-normal correlation surface. Biom., 24, 288. 

Rierz, H. L. (1937). Some topics in sampling theory. Bull. Am. Math. Soc., 43, 209. 

Rietz, H. L. (1938). On a recent advance in statistical inference. Am. Math. Monthh , 45, 149. 

Rrerz, H. L. (1939). On the distribution of the ‘ Student’ ratio for small samples from certain 
non-normal populations. Ann. Math. Stats., 10, 265. 

Risser, R. (1935-7). Exposé des principes de la statistique mathématique. J. Soc. Stat. Paris, 
7G, 2a; 77,031 ; 78, 40 


Se 


BIBLIOGRAPHY 489 


Ritcuiz-Scotr, A. (1918). The correlation coefficient of a polychoric table. Biom., 12, 93. 

Riroure-Scort, A. (1921). The incomplete moments of a normal solid. Biom., 13, 401. 

Rozs, aie ag The variate-difference method of seasonal variation. J. Am. Stat. Ass., 

, 250. 

Ross, R. A. (1930). Modifications of the link relative and interpolation methods of determining 
seasonal variation. Ann. Math. Stats., 1, 352. 

Rosrnson, 8. (1933). An experiment regarding the y?-test. Ann. Math. Stats., 4, 285. 

Rorr, M. (1937). Relation between results obtainable with raw and corrected correlation co- 
efficients in multiple factor analysis. Psychometrika, 2, 35. 

Romanovsky, V. (1923). Note on the moments of a binomial about its mean. Biom., 15, 410. 

Romanovsky, V. (1924). Generalisation of some types of the frequency-curves of Professor 
Pearson. Biom., 16, 106. 

Romanovsky, V. (1925a). On the moments of standard deviation and of correlation coefficient 
in samples from normal. Metron, 5, No. 4, 3. 

Romanovsky, V. (19255). On the moments of the hypergeometrical series. Biom., 17, 57. 

Romanovsky, V. (1926). On the distribution of the regression coefficient in samples from normal 
population. Bull. Acad. Sci. U.S.S.R., (6), 10, 643. 

Romanovsky, V. (1927). Note on orthogonalising series of functions and interpolation. Biom., 
19, 93. 

Romanovsky, V. (1928). On the criteria that two given samples belong to the same normal 
population. Metron, 7, No. 3, 3. 

Romanovsky, V. (1929). On the moments of means of functions of one and more random variables. 
Metron, 8, Nos. 1-2, 251. 

Romanovsky, V. (1931la). Sulla probabilita a posteriori. Giorn. Ist. Ital. Att., 2. 

Romanovsky, V. (19316). Sulle regressione multiple. Giorn. Ist. Ital. Att., 2. 

Romanovsky, V. (198lc, 1932a, 1933a). Généralisations d’un théoréme de M. Slutzky. Comptes 
rendus, 192,718; Sur la loi sinusoidale limite. Rend. Cire. Mat. Palermo, 56, 1; Sur 
une généralisation de la loi sinusoidale limite. Ibid., 57. 

Romanovsky, V. (19326). Due nuovi criteri di controllo sull’ andamento casuale di una successione 
di valori. Guéiorn. Ist. Ital. Att., 3, 203. 

Romanovskry, V. (19330). Ona property of the mean ranges in samples from a normal population 
and on some integrals of Professor Hojo. Biom., 25, 195. 

Romanovsky, V. (1934). Su due problemi di distribuzione casuale. Giorn. Ist. Ital. Att., 5, 196. 

Romanovsky, V. (1936a). Recherches sur les chaines de Markoff. Acta Math., 66, 147. 

Romanovsky, V. (19365). Note on the method of moments. Biom., 28, 188. 

Romanovsky, V. (1938). Analytical inequalities and statistical tests. (Russian, with English 
summary). Bull. Acad. Sci. U.S.S.R. Sér. Math., 4, 457. 

Roos, C. F. (1934). Dynamic Economics. Bloomington, Indiana. 

Roos, C. F. (1936). Annual survey of statistical technique. The correlation and analysis of time- 
series. Hconometrika, 4, 368. 

Roos, C. F. (1937). A general invariant criterion of fit for lines and planes where all variates are 
subject to error. Metron, 13, No. 1, 3. 

Roy, S. N. (1938). A geometrical note on the use of rectangular co-ordinates in the theory of 
sampling distributions connected with a multivariate normal population. Sankhya, 
Sguehoe 

Roy, S. N. (19392) A note on the distribution of the Studentised D?-statistic. Sankhyd@, 4, 373. 

Roy, S. N. (19396). p-statistics, or some generalisations on analysis of variance appropriate to 
multivariate problems. Sankhyd, 4, 381. 

Roy, 8. N. (1942a). The sampling distribution of p-statistics and certain allied statistics on the 
non-null hypothesis. Sankhyd, 6, 15. 

Roy, 8. N. (19426). Analysis of variance for multivariate normal populations, etc. Sankhya, 
6, 35. 


490 BIBLIOGRAPHY 


Satvemrt, T. (1934). Ricerche sperimentali sull’ interpolazione grafica di istogrammi. Metron, 
11, No. 4, 83. : 

SaLvemrnt, T. (1939). L’ indice di cograduazione del Gini nel caso di serie statistiche con ripeti- 
zioni. Metron, 13, No. 4, 27. 

Satvosa, L. R. (1930). Tables of Pearson’s Type III function. Ann. Math. Stats., 1, 191. 

SamuzLson, P. A. (1942). A method of determining explicitly the coefficients of a characteristic 
equation. Ann. Math. Stats., 13, 424. — 

Samuxrson, P. A. (1943). Fitting general Gram-Charlier series. Ann. Math. Stats., 14, 179. 

Sanpon, F. (1924). Note on the simplification of the calculation of abruptness coefficients to 
correct crude moments. Biom., 16, 193. 

Sansone, G. (1933). La chiusura dei sistemi ortogonali di Legendre, di Laguerre e di Hermite 
rispetto alle funzioni di quadrati sommabili. Giorn. Ist. Ital. Att., 4, 71. 

Sasuty, M. (1934). Trend Analysis of Statistics. Brookings Institution, Washington, D.C. 

SarrertawalrEe, F. E. (1943). Generalised Poisson distribution. Ann. Math. Stats., 13, 410. 

Savor, S. R. (1937a). The use of the median in tests of significance. Proc. Indian Acad. Sct., 
A, 5, 564. 

Savor, S. R. (1937b). A new solution of a problem in inverse probability. Proc. Indian Acad. 
Sci., A, 5, 222. 

Savor, S. R. (1939). A note on the arrangement of incomplete blocks when k = 3 and4 = 1. 
Ann. Eug. Lond., 9, 45. 

Savor, 8S. R. (1941). A test of significance in approximate periodogram analysis. Sankhyd, 6, 
77. 

Sonerré, H. (1942a). On the theory of testing composite hypotheses with one constraint. Ann. 
Math. Stats., 13, 280. 

ScuErré, H. (19426). On the ratio of the variances of two normal samples. Ann. Math. Siats., 
13, 371. . 

Scuurré, H. (1943). Statistical inference in the non-parametric case. Ann. Math. Stats., 14, 


305. 
Scumipt, E. (1934). Uber die Charlier-Jordansche Entwicklung einer willkiirlichen Funktion nach 


der Poissonschen Funktion und ihrer Ableitungen. Zeit. ang. Math. und Mech., 13, 139. 

Scumipt, R. (1934). Statistical analysis of one-dimensional distributions. Ann. Math. Stats., 
5,030, 

Scuvitz, H. (1930). The standard error of a forecast from a curve. J. Am. Stat. Ass., 25, 139. 

Scuuttz, H. (1933). The standard error of the coefficient of elasticity of demand. J. Am. Stat. 
Ass., 28, 64. 

Scuutz, H. (1939). A misunderstanding in index number theory: the true Kondés condition on 
cost-of-living index numbers and limitations. Hconometrika, 7, 1. 

Scuuttz, T. W., and SNepEcor, E. (1933). Analysis of variance as an effective method of handling 
the time element in certain economic statistics. J. Am. Stat. Ass., 28, 14. 

Scoumann, T. E. W. (1938). A general graduation formula for the smoothing of time series. 
Phil. Mag., 26, 970. 

ScoumMann, T. E. W., and Hormeyer, W. L. (1942). The problem of auto-correlation of meteoro- 
logical time-series, etc. @.J. Met. Soc., 68, 177. 

Scuustser, Sir Artuur (1898). On the investigation of hidden periodicities with application to 
a supposed 26-day period of meteorological phenomena. Terr. Mag., 3, 13. 

Scuuster, Sir ArTHuR (1899). The periodogram of the magnetic declination as obtained from 
the records of the Greenwich Observatory during the years 1871-1895. Trans. Camb. 
Phil. Soc., 18, 107. 

Scuuster, Sir ArtHur (1906). On the periodicities of sunspots. Phil. Trans., A, 206, 69. 

Scugarev, A. N. (1932). Uber die Mechanik der Massenprozesse (Kollektivgegenstandlehre), 
Metron, 9, Nos. 3-4, 1389. 

Szat, H. L. (1940). Tests of a mortality table graduation. J. Inst. Act., 71, No. 330. 


BIBLIOGRAPHY 491 


Szeat, J. E. (1938). Fiducial distribution of several parameters with application to a normal 
system. Proc. Camb. Phil. Soc., 34, 41. 

Suerrer, H. M. (1935). Concerning some methods of best approximation and a theorem of 
Birkhoff. Am. J. Maths., 57, 587. 

SHEpPPaRD, W. F. For bibliography see Ann. Eug. Lond., 1937, 8, 13. 

SHepparD, W. F. (1898a). On the application of the theory of error to cases of normal distributions 
and normal correlations. Phil. Trans., A, 192, 101, and Proc. Roy. Soc., 62, 170. 

SHEprarp, W. F. (1898). On the calculation of the most probable values of frequency constants 
for data arranged according to equidistant divisions of a scale. Proc. Lond. Math. Soc., 
29, 353. 

SHEprarp, W. F. (1914). Fitting of polynomials by the method of least squares. Proc. Lond. 
Math. Soc., (2), 13, 97. 

SHEPPARD, W. F. (1929). The fit of a formula for discrepant observations. Phil. Trans., A, 
228, 199. 

SHEPPARD, W. F. (1939, posthumous). The Probability Integral. British Ass. Math. Tables, vol. 7. 
Cambridge University Press. 

SHEwHART, W. A., and Winters, F. W. (1928). Small samples—new experimental results. 
J. Am. Stat. Ass., 23, 144. 

SHEwnart, W. A. (1931). The Economic Control of Quality of a Manufactured Product. van 
Nostrand, New York. 

SHowat, J. (1929). Inequalities for moments of frequency functions and for various statistical 
constants. Biom., 21, 361. 

SHowatT, J. (1930). Stieltjes integrals in mathematical statistics. Ann. Math. Stats., 1, 73. 

SHowat, J. (1935). On the development of functions in series of orthogonal polynomials. Bull. 
Am. Math. Soc., 41, 49. 

Smarma, J. B. (1941). On an optimum property of two important statistical tests. Biom., 
32, 70. 

Smaima, J. B. (1942). Interpolation for fresh probability levels between the standard table levels 

_ of a function. Biom., 32, 263. 

Srmmon, H. A. (1943). Symmetric tests of the hypothesis that the mean of one normal population 
exceeds that of another. Ann. Math. Stats., 14, 149. 

Sron, L. E. (1941). The Engineer’s Munual of Statistical Methods. John Wiley, New York. 

Srtmonsen, W. (1937). On the distributions of certain functions of samples from a multivariate 
infinite population. Skand. Akt., 20, 200. 

Srpos, A. (1930). Practical application of Jordan’s method for trend measurement. Hornyansky, 
Budapest. 

Stutzxy, E. (1914). On the criterion of goodness of fit of regression lines and on the best method of 
fitting them to data. J.R.S.S., 77, 78. 

Sturzxy, E. (1925). Uber stochastische Asymptoten und Grenzwerte. Metron, 5, No. 3, 3. 

Sturzky, E. (1934). Alcune applicazioni dei coefficienti di Fourier all’ analisi delle funzioni aleatorie 
stazionarie. Guiorn. Ist. Ital. Att., 5, 435. 

Sturzky, E. (1937a). Qualcune proposizione relativa alla teoria delle funzioni aleatorie. Guiorn. 
Ist. Ital. Att., 8, 183. ; 

Sturzky, E. (19376). The summation of random causes as the source of cyclic processes. Econo- 
metrika, 5, 105. 

Sumenorr, N. (1935). Uber die Verteilung des allgemeinen Gliedes in der Variationsreihe. Metron, 
125, Nowe25 59: 

Smrenorr, N. (1936). Sur la distribution de w?. Comptes rendus, 202, 449. 

Surrs, C. D. (1930). On generalised Tchebycheff inequalities in mathematical statistics. Am. 
J. Maths., 52, 109. 

Smita, C. D. (1939). On Tchebycheff approximations for decreasing functions. Ann. Math. Stats., 
10, 190. 


492 BIBLIOGRAPHY 


Surro, H. Farrrietp (1936). A discriminant function for plant selection. Ann. Eug. Lond., 
7, 240. 

Smita, K. (1916). On the ‘best’ values of the constants in frequency-distributions. Bzom., 
11, 262. 

Surru, K. (1918). On the standard deviation of the adjusted and interpolated values of an observed 
polynomial function and its constants ete. Biom., 12, 1. 

’ SmirH, K. (1922). The standard deviations of fraternal and parental correlation coefficients. 
Biom., 14, 1. 

SnepEcor, G. W., and Irwry, M. R. (1933). On the chi-square test for homogeneity. Jowa State 
College J. Sci., 8, 75. 

SnepEoor, G. W., and Cox, G. M. (1934a). Disproportionate sub-class numbers in tables of 
multiple classification. Jowa Agr. Exp. Station Res. Bull. No. 180. 

SnepEcor, G. W. (19346). Calculation and Interpretation of Analysis of Variance and Covariance. 
Collegiate Press, Ames, Iowa. 

SnEpDEcor, G. W. (1935). Analysis of covariance of statistically controlled grades. J. Am. Stat. 
Ass., 30, Supp., 263. 

Snow, E. C. (1911). On restricted lines and planes of closest fit to systems of points in any number 
of dimensions. Phil. Mag. (6), 21, 367. 

Sotomon, R. 8. (1939). An index of conformity based on the J-curve hypotheses. Sociometry, 
2, 63. 

Sorgr, H. E. (1913). On the probable error of the correlation coefficient to a second approxima- 
tion. Biom., 9, 91. 

Soper, H. E. (1914). On the probable error of the bi-serial expression for the correlation coefficient. 
Biom., 10, 384. : 

Soprmr, H. E. and others (1917). On the distribution of the correlation coefficient in small samples. 
Biom., 11, 328. 

Soper, H. E. (1922). Frequency Arrays. Cambridge University Press. 

Soper, H. E. (1929a). The general sampling distribution of the multiple correlation coefficient. 
J.RSS., 92, 445, 

Soper, H. E. (1929)). The imterpretation of periodicity in disease prevalence. J.R.S.S., 92, 34. 

‘ SOPHISTER ’ (1928). Discussion of small samples from an infinite skew universe. Biom., 20A, 
389. 

SPEARMAN, C. (1906). A footrule for measuring correlation. Brit. J. Psych., 2, 89. 

SPEARMAN, C. (1907). Demonstration of formule for true measurement of correlation. Am. J. 
Psych., 18, 161. 

SPEARMAN, C. (1910). Correlation calculated from faulty data. Brit. J. Psych, 3, 271. 

Starkey, D. M. (1938). A test of significance of the difference between means of samples from two 
normal populations without assuming equal variances. Ann. Math. Stats., 9, 201. 

Starkey, D. M. (1939). The distribution of the multiple correlation coefficient in periodogram 
analysis. Ann. Math. Stats., 10, 327. 

STEFFENSEN, J. F. (1923). Matematisk Iagtagelseslaehre. Copenhagen. 

STEFFENSEN, J. F. (1930). Some Recent Researches in the Theory of Statistics and Actuarial Science. 
Cambridge University Press. 

STEFFENSEN, J. I. (1934). On certain measures of dependence between statistical variables. 
Biom., 26, 251. 

STEFFENSEN, J. F. (1936). Free functions and the Student—Fisher theorem. Skand. Akt., 19, 108. 

STEFFENSEN, J. F. (1937). On the semi-normal distribution. Skand. Akt., 20, 60. 

STEmnHAvS, H. (1923). Les probabilités dénombrables et leur rapport & la théorie de la mesure. 
Fund. Math., 4, 286. 

SteKtorr, W. (1914). Quelques applications nouvelles de la théorie de fermeture au probléme 
de représentation approchée de fonctions et au probléme des moments. Mem. Acad. 
Imp. Sct. St. Pét., 32, No. 4. 


BIBLIOGRAPHY 493 


Sterne, T. E. (1934). The accuracy of least-square solutions. Proc. Nat. Acad. Sci., 20, 565 
and 601. 

Stevens, W. L. (1936). The analysis of interference. J. Genetics, 32, 51. 

Stevens, W. L. (1937a). The truncated normal distribution. Ann. Appl. Biol., 24, 847. 

STEvENS, W. L. (19376). Significance of grouping. Ann. Hug. Lond., 8, 57. 

Stevens, W. L. (1938a). The distribution of entries in a contingency table with fixed marginal 
totals. Ann. Eug. Lond., 8, 238. 

Stevens, W. L. (19386). The completely orthogonalised Latin square. Ann. Fug. Lond., 9, 82. 

Stevens, W. L. (1939a). Solution to a geometrical problem in probability. Ann. Fug. Lond., 
9, 315. 

StEvENS, W. L. (1939). Tests of significance for extra-sensory perception data. Psych. Rev., 
46, 142. 

St. Grorusscu, N. (1932). Further contributions to the sampling problem. Biom., 24, 65. 

STIELTJES, J. (1918). Recherches sur les fractions continues. M#uvres, Groningen. 

Stock, J.8., and Fran«KeEt, L. R. (1939). The allocation of samplings among several strata. Ann. 
Math. Stats., 10, 288. 

Stovurrer, 8. A., and Tippits, C. (1933). Tests of significance in applying Westergaard’s method 
of expected cases to sociological data. J. Am. Stat. Ass., 28, 293. 

Stourrer, 8. A. (1934). A coefficient of combined partial correlation with an example from 
sociological data. J. Am. Stat. Ass., 29, 70. 

Strourrer, 8. A. (1936a). Evaluating the effect of inadequately measured variables in partial 
correlation analysis. J. Am. Stat Ass., 31, 348. 

STouFFeR, 8. A. (19365). Reliability coefficients in a correlation matrix. Psychometrika, 1, 17. 

‘“Strupmnt’ (W. S. Gosset) (1907). On the error of counting with a hemacytometer. Biom., 
Smoot, 

*SrupEnt ’ (1908@). On the probable error of a mean. Biom., 6, 1. 

‘StupEnt ’ (19085). On the probable error of a correlation coefficient. Biom., 6, 302. 

‘SrupEntT’ (1909). On the distribution of means of samples which are not drawn at random. 


Biom., 7, 210. 
‘Sruprent’ (1913). The correction to be made to the correlation ratio for grouping. Biom., 
9, 316. 


‘Sruppnt’ (1914). The elimination of spurious correlation due to position in time or space. 
Biom., 10, 179. 

‘StupeEnt ’ (1919). An explanation of deviations from Poisson’s law in practice. Biom., 12, 211. 

‘SruprentT’ (1921). An experimental determination of the probable error of Dr. Spearman’s 
correlation coefficient. Biom., 13, 263. 

‘SrupEnt’ (1927). Errors of routine analysis. Biom., 19, 151. 

‘StupEent’ (193la). On the z-test. Biom., 23, 407. 

‘SrupEnT’ (19316). Yield trials. Article in Bailliére’s Encyclopedia of Scientific Agriculture. 

*“Srupgent’ (1931c). The Lanarkshire milk experiment. Biom., 23, 398. 

‘SrupENT’ (1938). Comparison between balanced and random arrangements of field plots. 
Biom., 29, 361. 

Stumerr, K. (1926). Fehlertheoretische Untersuchungen zur Periodogrammanalyse. Astr. Nach., 
226, 378. 

Srumperr, K. (1937). Grundlagen und Methoden der Pertodenforschung. Berlin. 

SUBRAMANIAN, S. (1935). On a property of partial correlation. J.R.S.S., 98, 129. 

Suxpatme, P. V. (1935). Contribution to the theory of the representative method. Supp. 
J RSS., 2, 253. 

Suxnatme, P. V. (19362). A contribution to the problem of two samples. Proc. Indian Acad. 
Sci., 2, A, 584. 

Suxuatme, P. V. (19365). On the analysis of k samples from exponential populations with especial 
reference to the problem of random intervals. Stat. Res. Mem., 1, 94. 


494 BIBLIOGRAPHY 


SuxHatmE, P. V. (1937a). Tests of significance for samples of the y? population with two degrees 
of freedom. Ann. Eug. Lond., 8, 52. 

Suxnatme, P. V. (1937b). The problem of & samples for Poisson population. Proc. Nat. Inst. 
Sci. India, 3, 297. 

SuxuatmeE, P. V. (1938a). On the distribution of 7? in samples from a Poisson series. Supp. 
deo ey toe 

SuxHatME, P. V. (19385). On Fisher and Behrens’ test of significance for the difference in means 
of two normal samples. Sankhyd, 4, 39. 

Suxuatme, P. V. (1938c). On bipartitional functions. Phil. Trans. Roy. Soc., A, 237, 375. 

SuxHatme, P. V. (1944). Moments and product-moments of moment-statistics for samples of the 
finite and infinite populations. Sankhyd, 6, 363. 

Swaroop, S. (1938). Tables of the exact values of probabilities for testing the significance of 
differences between proportions based on pairs of small samples. Sankhyda, 4, 73. 

Swen, F. S., and Ersennart, C. (1943). Tables for testing randomness in a sequence of alterna- 
tives. Ann. Math. Stats., 14, 66. 


Tana, P. C. (1938). The power function of the analysis of variance tests with tables and illustra- 
tions of their use. Stat. Res. Mem., 2, 126. 

Tane, Y. (1938). Certain statistical problems arising in plant breeding. Biom., 30, 29. 

Tappan, M. (1927). On partial multiple correlation coefficients in a universe of manifold char- 
acteristics. Biom., 19, 39. 

TartueR, A. (1935). On a certain class of orthogonal polynomials. Am. J. Maths., 57, 627. 

Tavcer, R. (1936). I fenomeni di selezione a la teoria dei gruppi. Gorn. Ist. Ital. Att., 7, 16. 

TouEesycuHeErr, P. L. (1907) Guvres. 2 vols. St. Pétersbourg: including: Sur une formule 
d’analyse, 1, 701 (1854); Sur les fractions continues, 1, 203 (1856); Sur une nouvelle 
série, 1, 381 (1858); Sur l’interpolation par la méthode des moindres carrés, 1, 478 
(1859) ; Sur linterpolation des valeurs équidistantes, 2, 219 (1875). 

Tepin, O. (1931). The influence of systematic plot arrangements upon the estimate of error in 
field experiments. J. Agr. Sci., 21, 191. 

Tareve, T.N. (1931). Theory of Observations. Reprint in Ann. Math. Stats., 2, 165, of the English 
version published in 1903. 

THompson, C. M., Pearson, E.S., Comrig, L. J., and Hartiey, H. O. (1941a). Tables of percent- 
age points of the incomplete beta-function. Biom., 32, 151. 

Txomeson, C. M. (1941b). Tables of percentage points of the y?-distribution. Biom., 32, 187. 

THomrson, W. R. (1933). On the likelihood that one unknown probability exceeds another in 
view of the evidence of two samples. Biom., 25, 286. 

Tuompson, W. R. (1935). On a criterion for the rejection of observations and the distribution of 
the ratio of deviation to sampling standard deviation. Ann. Math. Stats., 6, 214. 

Tompson, W. R. (1936). On confidence ranges for the median and other expectation distribu- 
tions for populations of unknown distribution form. Ann. Math. Stats., 7, 122. 

THompson, W. R. (1938). Biological applications of normal range and associated significance 
tests in ignorance of original distribution forms. Ann. Math. Stats., 9, 281. 

THomson, G. H. (1916). -A hierarchy without a general factor. Brit. J. Psych, 8, 271. 

Tomson, G. H. (1919a). The criterion of goodness of fit of psychophysical curves. Biom., 12, 
216. 

Txomson, G. H. (19195). On the degree of perfection of hierarchical order among correlation 
coefficients. Biom., 12, 355, and (correction), 15, 150. 

THomson, G. H. (1935). On complete families of correlation coefficients, etc. Brit. J. Psych., 
26, 63. 

THomson, G. H. (1939). The factorial analysis of ability. Brit. J. Psych., 30, 71 and 105. 

THORNDIKE, EH. L. (1937). On correlations between measurements which are not normally dis- 
tributed. J. Educ. Psych., 28, 367. ; 


BIBLIOGRAPHY 495 


THouLEss, R. H. (1939). The effect of errors of measurement on correlation coefficients. Brit. 
J. Psych, 29, 383. 

THuRsToNE, L. L. (1935). Vectors of Mind. Chicago. 

THursTonE, L. L. (1938). A new rotational method in factor analysis. _ Psychometrika, 43, 199. 

TINBERGEN, J. (1937). An Econometric Approach to Business Cycle Problems. Paris. 

TINBERGEN, J. (1938). On the theory of business cycle control. Econometrika, 6, 22. 

TINTNER, G. (1935). Prices and the Trade Cycle. Vienna. 

TInTNER, G. (1940). The Variate-Difference Method. Bloomington Press, Indiana. 

TINTNER, G. (1941). The variate-difference method: a reply. Econometrika, 9, 163. 

Treretr, L. H.C. (1925). On the extreme individuals and the range of samples taken from a normal 
population. Biom., 17, 364. 

Trepetr, L. H. C. (1931). The Methods of Statistics. Williams and Norgate, London. 2nd edn. 
1937. 

Trepert, L. H. C. (1932). A modified method of counting particles. Proc. Roy. Soc., A, 
137, 434. 

Tippett, L. H. C. (1935). Some applications of statistical methods to the study of variation of 
quality in the production of cotton yarn. Supp. J.R.S.S., 2, 27. 

TopuuntTer, I. (1865). A History of the Mathematical Theory of Probability from the time of Pascal 
to that of Laplace. Macmillan, London. 

TornieER, E. (1929). Wahrscheinlichkeitsrechnung und Zahlentheorie. J. rein. und ang. Math., 
160, 177. 

TornipR, E. (1930). Die Axiome der Wabhrscheinlichkeitsrechnung. J. rein. und ang. Math., 
163, 45. 

ToRNIER, E. (1933). Grundlagen der Wahrscheinlichkeitsrechnung. Acta Math., 60, 239. 

TornigeR, E. (1936). Wahrscheinlichkeitsrechnung und allgemeine Integrationstheorie. Teubner, 
Leipzig. 

TORNIER, B (1937). Verallgemeinerung des Riickschluss-Satzes der Wahrscheinlichkeitsrechnung. 
Deutsche Math., 2, 469. 

TRACHTENBERG, H. L. (1921). Analysis of the periodogram. J.R.S.S., 84, 578. 

Travers, R. M. W. (1939). The use of a discriminant function in the treatment of psychological 
group-differences. Psychometrika, 4, 25. 

TrecLoar, A. E., and Witper, M. A. (1934). The adequacy of ‘Student’s’ criterion of deviations 
in small sample means. Ann. Math. Stats., 5, 324. 

TRICOMI, F. (1935, 19362). Sula rappresentazione di una legge di probabilité mediante esponenziali 
di Gauss e la transformazione di Laplace. Guiorn. Ist. Ital. Att., 6, 135 and 7, 42. 

Tricomi, F. (19365). Sulla media dei valori assoluti di errori seguente la legge di Gauss. Gorn. 
st. Heart 7 7280, 

Tricomt, F. (1937). Sul rapporto fra la media dei quadrati di pit errori e il quadrato della media 
dei loro valori assoluti. Giorn. Ist. Ital. Att., 8, 68 and 127. 

Tricomt, F. (1938). Les transformations de Fourier, Laplace, Gauss et leurs applications au calcul 
des probabilités et a la statistique. Ann. Inst. H. Poincaré, 8, 111. 

Truxsa, L. (1940). The simultaneous distribution in samples of mean and standard deviation and 
of mean and variance. Biom., 31, 256. 

Tscuuprow, A. A. (1918, 1919a). Zur Theorie der Stabilitit statistischer Reihen. Skand. Alt., 
1, 199 (1918), and 2, 80 (1919). 

Tscuuprow, A. A. (1918b, 1921, 1923). On the mathematical expectation of the moments of 
frequency-distributions. Biom., 12, 140 and 185; Biom., 13, 283; and Metron, 2, 
No. 3, 461 and No. 4, 646. 

Tscnuprow, A. A. (1925). Grundbegriffe und Grundprobleme der Korrelationstheorie. Teubner, 
Leipzig. (English translation as The Mathematical Theory of Correlation, William Hodge, 


1939.) 
Tscuuprow, A. A., trans. by L. Isseriis (1928). The mathematical theory of the statistical 


496 BIBLIOGRAPHY 


methods employed in the study of correlation in the case of three variables. Trans. 
Camb. Phil. Soc., 23, 337. 

Tscnuprow, A. A. (1934). The mathematical foundations of the methods to be used in statistical 
investigation of the dependence between two chance variables. Nordisk Statistik Tidskrift, 
5, 34. 

Turner, H. H. (1913). Tables for facilitating the use of harmonic analysis. Oxford University 
Press. 


Ursan, F. M. (1918). Uber den Begriff der mathematischen Wahrscheinlichkeit.  Vierteljahr- 
schrift fiir Wiss. Phil. and Soz., 10. 

Usprrnsky, J. V. (1937). Introduction to Mathematical Probability, McGraw-Hill, New York and 
London. 


Vaspa, S. (1939). Die Wahrscheinlichkeit einer bestimmten Auszahlungssumme. Skand. Akt., 
22, 10: 

VAN DER Pot, B. (1930). Oscillations sinusoidales et de rélaxation. (L’onde éléctrique, juin-juillet, 
Chiron, Paris. 

van Kampsn, E. R. (1937a). On the addition of convex curves and the densities of certain infinite 
convolutions. Am. J. Maths. 59, 679. 

van Kampen, E. R. and Wintwer, A. (19376, 1937c). Convolutions of distributions on convex 
curves and the Riemann zeta-function. Am. J. Maths., 59, 175; and: On divergent 
infinite convolutions. Jbid. 59, 635. 

van Kampen, E. R. (1939a). On the asymptotic distribution of a uniformly almost periodic 
function. Am. J. Maths. 61, 729. ‘ 

van Kampen, E. R. and Wintwer, A. (19396). A limit theorem for probability distributions on 
lattices. Am. J. Maths. 61, 965. 

van Uven, M. J. (1932). Compensazione degli errori di un rapporto. Metron, 10, No. 3, 
185. 

van Uven, M. J. (1939). Adjustment of a ratio. Ann. Hug. Lond., 9, 181. 

Venn, J. A. (1888). The Logic of Chance. 3rd edn., Macmillan, London. (Out of print.) 

Vernon, P. E, (1936). A note on the standard error in the contingency-matching technique. 
J. Educ. Psych., 27, 704. 

Vittars, D.S., and ANDERSON, T. W. (1943). Some significance tests for normal bivariate distribu- 
tions. Ann. Math. Stats., 14, 141. 

Vite, J. A. (1936a, 6). Sur les suites indifférentes. Comptes rendus, 202, 1393; and: Sur la 
notion de collectif. Jbid., 203, 26. 

Viute, J. A. (1936c). Sur la convergence de la médiane des n premiers résultats d’une suite infinie 
d’épreuves indépendantes. Comptes rendus, 203, 1309. 

Vitte, J. A. (1939). Etude critique de la notion de collectif. Paris: Thése. 

Vinci, F. (1920). Sui coefficienti di variabilita. Metron, 1, No. 1, 62. 

Vinci, F. (1934). Significant developments in business cyele theory. Econometrika, 2, 125. 

VotTERRA, V. (1936). Les équations des fluctuations biologiques et le calcul des variations. 
Comptes rendus, 202, 1935; Les équations canoniques. Jbid., 202, 2023; and: Sur 
Vintégration des équations. Ibid., 202, 2113. 

von Bortxiewicz, L., (1898). Das Gesetz der kleinen Zahlen. Teubner, Leipzig. 

von Borrgriewicz, L. (1910). Zur Verteidigung des Gesetzes der kleinen Zahlen. Jahrb. Nat. 
Ok. und Stat., (3), 39, 218. 

von Borrkiewicz, L. (1915a). Uber die Zeitfolge zufilliger Ereignisse. Bull. Inst. Int. de Stat., 
20, 2° livre. 

von BortkiEwicz, L. (19155). Realismus und Formalismus in der mathematischen Statistik. 
Allg. Stat. Arkiv., 9, 225. 

von Borrxirwicz, L. (1917). Die Iterationen. Berlin. 


BIBLIOGRAPHY 497 


von Borrkiewicz, L. (1922). Das Helmertsche Verteilunggesetz fiir die Quadratsumme zufilliger 
Beobachtungsfehler. Zeit. ang. Math. und Mech. 2, Heft 5. 

von Borrkiewicz, L. (1931). The relation between stability and homogeneity. Ann. Math. 
Stats. 2, 1. 

von Misss, R. (1919a, b). Fundamentalsitze der Wahrscheinlichkeitsrechnung. Math. Zeitt., 
4, 1 and: Grundlagen, ibid., 5, 52. 

von Mises, R. (1921). Das Problem der Iterationen. Zeit. ang. Math. und Mech., 1, 298. 

von Miszs, R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit. Springer, Berlin; 3rd rev. 
edn., 1936; trans. as Probability, Statistics and Truth, 1939. W. Hodge, London. 

von Miszs, R. (1931). Wahrscheinlichkeitsrechnung. Deuticke, Wien. 

von Miss, R. (1933). Uber Zahlenfolge die ein Kollektivahnliches Verhalten zeigen. Math. 
Ann., 108, 757. 

von Miszs, R. (1936a). Sul concetto di probabilité fondato sul limite di frequenze relative. 
Giorn. Ist. Ital. Att., 7, 235. 

von Miszs, R. (19365). Les lois de probabilité pour les fonctions statistiques. Ann. Inst. H. Poincaré, 
6, 185. 

von Miszs, R. (1937). Bestimmung einer Verteilung durch ihre ersten Momente. Skand. Akt., 
20, 220. 

von Miszs, R. (1938). A modification of Bayes’ problem. Ann. Math. Stats., 9, 256. 

von Misgs, R. (1939a). The limits of a distribution function if two expected values are given. 
Ann. Math. Stats., 10, 99. 

von Miszs, R. (1939). An inequality for the moments of a discontinuous distribution. Skand. 
Akt., 22, 32. 

von Miszs, R. (1939c). Uber Aufteilungs- und Besitzungs-Wahrscheinlichkeiten. Rev. Fac. Sci. 
Univ. Istamboul, (4), Fase. 1-2, 145. 

von Miszs, R. (1941). On the foundations of probability and statistics. Ann. Math. Stats. 
12, 191. 

von NeuMANN J., and others (194la, 6). The mean-square successive difference. Ann. Math. 
Stats., 12, 153; (von Neumann alone): Distribution of the ratio of the mean-square 
successive difference to the variance. IJbid., 12, 367; von Neumann and Hart, B.I.: 
Tabulation of the probabilities for the ratio of the mean-square successive difference to 
the variance. Ibid., 13, 207. mp 

von ScHELLING, H. (1934). Die Konzentration einer Verteilung und ihre Abhingigkeit von den 
Grenzen des Variationsbereiches. Metron, 11, No. 4, 3. 

von Szeuiski, V. S. (1929). Experiments in the correlation of time-series. J. Am. Stat. Ass., 
24, Supp., 241. 


Watp, A. (1936a). Berechnung und Ausschaltung von Saisonschwankungen. Beitrige zur Kon- 
junkturforschung. O6ester. Inst. Konj., 9, 35. 

Wa p, A. (19366). Sur la notion de collectif dans le calcul des probabilités. Comptes rendus, 
202, 180. 

Waxp, A. (1937). Die Widerspruchsfreiheit des Kollektivbegriffs der Wahrscheinlichkeitsrechnung. 
Ergeb. math. Kolloqu., Hamburg, No. 8, 38. 

Waxp, A. (1938). A generalisation of Markoff’s inequality. Ann. Math. Stats., 9, 244. 

Watp, A. (1939a). Contributions to the theory of statistical estimation and testing hypotheses. 
Ann. Math. Stats., 10, 299. 

Watp, A., and Wotrowitz, J. (19396). Confidence limits of continuous distribution functions. 
Ann. Math. Stats., 10, 105. 

Wap, A. (19402). The fitting of straight lines if both variables are subject to error. Ann. Math. 
Stats., 11, 284. 

Wap, A. (19400). A note on the analysis of variance with unequal class-frequencies. Ann. 
Math. Stats., 11, 96. 

A.S.— VOL. IL. KK 


498 : BIBLIOGRAPHY 


Waxp, A. and Worrowrrz, J. (1940c). On a test whether two samples are from the same popula- 
tion. Ann. Math. Stats., 11, 147. ; 

Wap, A. (194la). Asymptotically most powerful tests of statistical hypotheses. Ann. Math. 
Stats., 12, 1 and 396. 

Watp, A., and Brooxner, R. J. (19416). On the distribution of Wilks’ statistic, etc. Ann. 
Math. Stats., 12, 137. 

Watp, A., and Wotrowirz, J. (1941c). Note on confidence limits for continuous distribution 
functions. Ann. Math. Stats., 12, 118. 

Watp, A. (1941d). On the analysis of variance in case of multiple classifications with unequal 
class frequencies. Ann. Math. Stats., 12, 346. 

Watp, A. (1942a). Asymptotically shortest confidence intervals. Ann. Math. Stats., 13, 127. 

Watp, A. (1943). On the efficient design of statistical investigations. Ann. Math. Stats.,14, 134. 

WALKER, Sir GitBerT (1914). On the criterion for the reality of relationships or periodicities. 
Calcutta Ind. Met. Mems., 21, part 9. 

WALKER, Six GILBERT (1925). On periodicity. Q. J. Roy. Met. Soc., 51, 387. 

WaLker, Sir Grpert (1927). On periodicity and its existence in European weather. Mem. 
Roy. Met. Soc., 1, No. 9. 

WaLkeR, Str GitBerT (1931). On periodicity in series of related terms. Proc. Roy. Soc., A, 
131, 518. 

Waker, H. M. (1929). Studies in the History of Statistical Method. Williams and Wilkins, 
Baltimore. 

Waker, H. M., and Sanrorp, V. (1934). The accuracy of computation with approximate 
numbers. Ann. Math. Stats., 5, 1. 

Wattace, N., and Travers, R. M. W. (1938). A psychometric sociological study of a group of 
speciality salesmen. Ann. Hug. Lond., 8, 266. 

Wauus, W. A. (1939). The correlation ratio for ranked data. J. Am. Stat. Ass., 34, 533. 

Watts, W. A., and Moors, G. H. (1941). A significance test for time-series. Technical Paper 
No. 1. Nat. Bur. Ec. Research. 

Watts, W. A. (1942). Compounding probabilities from independent significance tests. Hcono- 
metrika, 10, 229. 

Watkins, G. P. (1933). An ordinal index of correlation. J. Am. Stat. Ass., 28, 139. 

Waveu, F. V. (1942). Regressions between sets of variables. Hconometrika, 10, 290. 

Wesster, M. 8. (1938). Orthogonal polynomials with orthogonal derivations. Bull. Am. Math. 
Soc., 44, 880. 

Weipa, F. M. (1934). On measures of contingency. Ann. Math. Stats., 5, 308. 

Weta, F. M. (1935). On certain distribution functions when the law of the universe is Poisson’s 
first law of error. Ann. Math. Stats., 6, 102. 

Weiss, M. G., and Cox, G. M. (1939). Balanced incomplete block and lattice-square designs for 
testing yield differences among large numbers of soya bean varieties. Jowa Agr. Exp. 
Stat. Res. Bull., 257, 289. 

Wextcs, B. L. (1935). Some problems in the analysis of regression among k samples of two variables. 
Biom., 27, 145. 

Wextcu, B. L. (1936a). Note on an extension of the L, test. Stat. Res. Mem., 1, 52. 

We cu, B. L. (19366). Specification of rules for rejecting too variable a product, ete. Supp. 
JHISS., 3, 29. 

Wetcu, B. L. (1937). On the z-test in randomised blocks and Latin squares. Biom., 29, 21. 

Wetca, B. L. (1938a). On tests for homogeneity. Biom., 30, 149. 

We cu, B. L. (19380). The significance of the difference between two means when the population 
variances are unequal. Biom., 29, 350. 

Werton, B. L. (19394). On confidence limits and sufficiency with particular reference to parameters 
of location. Ann. Math. Stats., 10, 58. 

Wetca, B. L. (19396). Note on discriminant functions. Biom., 31, 218. 


BIBLIOGRAPHY 499 


Wetca, B. L. (1939c). On the distribution of maximum likelihood estimates. Biom., 31, 187. 

WertHEmm™ER, A. (1932). <A generalised error function. Ann. Math. Stats., 3, 64. 

WERTHEIMER, A. (1937). Note on Zoch’s paper on the postulate of the arithmetic mean. Ann. 
Math. Stats., 8, 112. 

Waerry, R. J. (1935). The shrinkage of the Brown-Spearman prophecy formula. Ann. Math. 
Stats.. 6, ell83" 

Wuitaker, L. (1914)., On Poisson’s.law of small numbers. Biom., 10, 36. 

Wuittaker, E. T., and Roprson, G. (1940). The Calculus of Observations. 3rd edn. Blackie 

& Sons. 

Wuitwortu, W. A. (1901). Choice and Chance. 5th edn. Deighton Bell and Co. Cambridge. 

WicksELL, S. D. (1917a). On logarithmic correlation with an application to the distribution of 
ages at first marriage. Medd. Lunds Astr. Obs., No. 84. 

WIcKSELL, S. D. (1917b). The correlation function of Type A. Kungl. Svenska Vetenskapsakad. 
Handl. Bd. 58; Medd. Lunds Astr. Obs. Series 2, No. 17. 

WICKSELL, S. D. (1921). An exact formula for spurious correlation. Metron, 1, No. 4, 33. 

WickKsELL, 8. D. (1933). On correlation functions of Type III. Biom., 25, 121. 

WICcKSELL, 8. D. (1934a)._ Expansions of frequency functions for integer variates in series. Skand. 
Matematikercongressen 1 Stockholm, p. 306. 

WIcKSELL, 8. D. (19346). Analytical theory of regression. Medd. Lunds Astr. Obs. Series 2, 
No. 69. 

Wipvper, D. V. (1934). The inversion of the Laplace integral and the related moment problem. 
Trans. Am. Math. Soc., 36, 107. 

Wrener, N. (1930). Generalised harmonic analysis. Acta Math., 55, 117. 

Wisner, N. (1938). The homogeneous chaos. Am. J. Math., 60, 897. 

Wirks, S. 8. (1932a). Moments and distributions of estimates of population = from 
fragmentary samples. Ann. Math. Stats., 3, 163. 

Wiks, 8S. S. (19326). On the sampling Si aution of the multiple correlation concen Ann. 
Math. Stats., 3, 196. 

Wis, S. S. (1932c). On the distribution of statistics in samples from a normal population of two 
variables with matched sampling for one variable. Metron, 9, Nos. 3-4, 87. 

Wiugs, S. S. (1932d). The standard error of a tetrad in samples from a normal population of 
independent variables. Proc. Nat. Acad. Sci., 18, 562. 

Wuxs, S§. S. (1932e). Certain generalisations in the analysis of variance. Biom., 24, 471. 

Wis, S. S. (1934). Moment-generating operators for determinants of product-moments in 
samples from a normal system. Ann. Math., 35, 312. 

Witzs, 8. S. (1935a). The likelihood test of independence in contingency tables. Ann. Math. 
Stats., 6, 190. 

Wuxs, S. S. (19355) On the independence of & sets of normally distributed statistical variables. 
onnmiende, 3, 309. 

Wuxs, S. S. (1935c). Test criteria for statistical hypotheses involving several variables. J. Am. 
Stat. Ass., 30, 549. 

Wigs, S. 8. (1936). The sampling theory of systems of variances, covariances and intra-class 
covariances. Ann. J. Math., 58, 426. 

Wits, S. 8., and Tuompson, C. M. (1937a). The sampling distribution of the criterion Ag, when 
the hypothesis tested is not true. Biom., 29, 124. 

Wus, S. S. (1937b). The analysis of variance for two or more variables. Third Ann. Conf. 
Econ. Stat. Colorado Springs, p. 82. 

Wis, S. 8. (19382). The large-sample distribution of the likelihood ratio for testing composite 
hypotheses. Ann. Math. Stats., 9, 60. 

Wis, S. S. (1938). Shortest average confidence intervals from large samples. Ann. Math, 
Stats., 9, 166. 

Wus, S. S. (1938c). Fiducial distributions in fiducial inference. Ann. Math. Stats., 9, 272. 


500 BIBLIOGRAPHY 


Wis, S. 8. (1938d). Weighting systems for linear functions of correlated variables when there 
is no dependent variable. Psychometrika, 3, 23. 

Wis, S. S. (1988e). The analysis of variance and covariance in non-orthogonal data. Metron, 
13, No. 2, 141. 

Wits, 8. 8. (19392). Optimum fiducial regions for simultaneous estimation of several population 
parameters for large samples. (Abstract). Ann. Math. Stats., 10, 85. 

Wis, S. 8., and Daty, J. F. (19390). An optimum property of confidence regions associated 
with the likelihood function. Ann. Math. Stats., 10, 225. 

Wiss, S. S. (1941). On the determination of sample sizes for setting tolerance limits. Ann. 
Math. Stats., 12, 91. 

Wus, 8. 8. (1943). Mathematical Statistics. Princeton University Press. 

Wiuurms, C. B. (1937). The use of logarithms in the interpretation of certain entomological 
problems. Ann. App. Biol., 24, 404. 

Witurams, J. D. (1941). Moments of the ratio of the mean-square successive difference to the mean- 
square difference in samples from a normal population. Ann. Math. Stats., 12, 239. 

Wiuspon, B. H. (1934). Discrimination by specification statistically considered and illustrated 
by the standard specification for Portland cement. Supp. J.R.S.S., 1, 152. 

Wixson, E. B. (1928). On hierarchical correlation systems. Proc. Nat. Acad. Sci., 14, 283. 

Wiuson, E. B., and Hitrerry, M. M. (193la). The distribution of chi-square. Proc. Nat. Acad. 


Sci., 17, 694. 

Witson, E. B., Hitrerty, M. M., and Manmr, H. C. (19316). Goodness of fit. J. Am. Stat. Ass., 
26, 443. 

Wison, E. B. (1938). The standard deviation of sampling for life expectancy. J. Am. Stat. Ass., 
33, 705. 


Wintwer, A. (1934a). On the addition of independent distributions. Am. J. Maths., 56, 8. 

Wintwer, A. (19346). On the asymptotic differential distribution of almost periodic and related 
functions. Am. J. Maths., 56, 401. 

Wintner, A. (1935). Papers on convergent convolutions. Am. J. Maths., 57, 363, 821, 827, 

’ 839; and Bull. Am. Math. Soc., 41, 137. 

Wintner, A. (1936). On a class of Fourier transforms. Am. J. Maths., 58, 45. 

WisHART, J. (1926). On Romanovsky’s generalised frequency curves. Biom., 18, 221. 

WisHART, J. (1927). On the approximate quadrature of certain skew curves with an account of 
the researches of Thomas Bayes. Biom., 19, 1. 

WisHa4rt, J. (1928). The generalised product-moment distribution in samples from a normal 
multivariate population. Biom., 20A, 32. 

Wisnart, J. (1929a). The correlation between product-moments of any order in samples from 
a normal population. Proc. Roy. Soc. Edin., 49, 1. 

WisHart, J. (1929). A problem of combinatorial analysis giving the distribution of certain 
moment-statistics. Proc. Lond. Math. Soc., 29, 309. 

Wisuart, J. (1930). The derivation of certain high-order sampling product-moments from a 
normal population. Biom., 22, 224. 

WisHarT, J. (193la). Notes on frequency constants. J. Inst. Act., 62, 174. 

Wisuart, J. (19316). The mean and second-moment coefficient of the multiple correlation 
coefficient in samples from a normal population. Biom., 22, 353. 

WisHart, J. (1932a). A note on the distribution of the correlation ratio. Biom., 24, 441. 

WisHakt, J., and Bartuett, M. S. (19326). The distribution of second-order moment coefficients 
in small samples. Proc. Camb. Phil. Soc., 28, 455. 

Wisuart, J. (1933a). The theory of orthogonal polynomial fitting. J.R.S.S., 96, 487. 

WisHart, J. (19335). A comparison of the semi-invariants of the distributions of moment and 
semi-invariant estimates in samples from an infinite population. Biom., 25, 52. 

WisHart, J., and Bartiert, M. S. (1933c). The generalised product-moment distribution in a 
normal system. Proc. Camb. Phil. Soc., 29, 260. 


BIBLIOGRAPHY 501 


Wisuart, J. (1934a). Statistics in agricultural research. Supp. J.R.S.S., 1, 26. 

WisHart, J. (19345). Bibliography of agricultural statistics. Supp. J.R.SS., 1, 95. 

WisuHart, J., and SanperS, H. G. (1935). Principles and Practice of Field Experimentation. 
Empire Cotton-growing Corporation, London. 

WisnHart, J. (1936). Tests of significance in analysis of covariance. Supp. J.R.S.S., 3, 79. 

WisHakt, J. (1938). Field experiments of factorial design. J. Agr. Sci., 28, 299. 

Wisuart, J. (1939). Statistical treatment of animal experiments. Supp. J.R.S.S., 6, 1. 

Wisnirwsxt, J. (1934). Interdependence of cyclical and seasonal fluctuation. Hconometrika, 
2, 176. 

WISNIEWSET, J. (1935, 1936). On the validity of a certain Pearson’s formula. Biom., 27, 356; 
and: Rejoinder. Biom., 28, 190. 

WISNIEWSEI, J. (1937a), A problem in least squares. Ann. Math. Stats., 8, 145. 

WisniewskI, J. (19376). A note on inverse probability. J.R.S.S., 100, 417. 

Woxp, H. (1934a). Sulle correzione di Sheppard. Giorn. Ist. Ital. Att., 4, 304. 

Woxp, H. (19346). Sheppard’s correction formule in several variables. Skand. Akt., 
17, 248. 

Wo tp, H. (1935). A study on the mean difference, concentration curves and concentration ratio. 
Metron, 12, No. 2, 39. 

Wo tp, H. (1936). On quantitative statistical analysis. Skand. Akt., 19, 281. 

Wo tp, H. (19382). A Study in the Analysis of Stationary Tivme-Series. Almquist and Wiksells, 
Uppsala. 

Wo tp, H. (19385). On the inversion of moving averages. Skand. Akt., 21, 208. 

Wo tp, H. (1939). Uber stochastische Prozesse, inbesondere solche stationairer Natur. 9 Cong. 
des Math. Scand. Helsingfors, p. 207. 

Wotrowltz, J. (1942). Additive partition functions and a class of statistical hypotheses. Ann. 
Math. Stats., 13, 247. 

Wotrowi1tTz, J. (1943). On the theory of runs with some applications to quality control. Ann. 
Math. Stats., 14, 280. 

Wona, Y. K. (1935). An application of the orthogonalisation process to the theory of least squares. 
Ann. Math. Stats., 6, 58. 

Wona, Y. K. (1937). On the elimination of variables in multiple correlation. J. Am. Stat. Ass., 
32, 357. 

Woopsury, M. A. (1940). Rank correlation when there are equal variates. Ann. Math. Stats., 
11, 358. 

Worsrne, H. and Horrtire, H. (1929). Applications of the theory of error to the interpretation 
of trends. J. Am. Stat. Ass., 24, Supp., 73. 

Wricut, S. (1934). The method of path coefficients. Ann. Math. Stats., 5, 161. 


Yasuxawa, K. (1925). On the means, standard deviations, correlations and frequency-distribu- 
tions of functions of variates. Biom., 17, 211. 

Yasuxawa, K. (1926). On the probable error of the mode of frequency-distributions. Biom., 
18, 263. 

Yasugnawa, K. (1934). On the deviation from normality of the frequency-distributions of functions 
of normally distributed variates. Tokohu Math. J., 38, 465. 

Yates, F. (1933a). The principles of orthogonality and confounding in replicated experiments. 
J. Agr. Sci., 23, 108. 

Yates, F. (19332). The analysis of replicated experiments when the field results are incomplete. 
Emp. J. Exp. Agr., 1, 129. 

Yarus, F. (1933c). The formation of Latin squares for use in field experiments. Emp. J. Exp. 

Fo ll, Zoo. ; 

YATES, Poa). The analysis of multiple classifications with unequal numbers in the different 

classes. J. Am. Stat. Ass., 29, 51. 


502 BIBLIOGRAPHY 


Yates, F. (1934). Contingency tables involving small numbers and the ¥?-test. Supp. J.RSS., 
itera We : 

Yates, F. (1935a). Some examples of biassed sampling. Ann. Hug. Lond., 6, 202. 

Yates, F. (19350). Complex experiments. Supp. J.R.SS., 2, 181. 

Yates, F., and Zacopanay, I. (1935c). The estimation of the efficiency of sampling, with special 
reference to sampling for yield in cereal experiments. J. Agr. Sct., 25, 545, 

Yares, F. (1936a). Incomplete Latin squares. J. Agr. Sci., 26, 301. 

Yarss, F. (19366). Incomplete randomised blocks. Ann. Hug. Lond., 7, 121. 

Yates, F. (1936c). Applications of the sampling technique to crop estimation and forecasting. 
Trans. Manchester Stat. Soc., 103. 

Yarns, F. (1936d). A new method of arranging variety trials involving a large number of varieties. 
J. Agr. Sci., 26, 424. 

Yates, F. (1937a). A further note on the arrangement of variety trials. Quasi-Latin squares. 
Ann. Hug. Lond., 7, 319. 

Yares, F. (1937b). The design and analysis of factorial experiments. Imp. Bur. Soil Sci. Tech. 
Comm., No. 35. 

Yates, F. (1938a). The gain in efficiency resulting from the use of balanced designs. Supp. 
J he. Ova Ds 10: 

Yares, F., and Cocuran, W. G. (1938). The analysis of groups of experiments. J. Agr. Scv., 
28, 556. 

Yates, F. (1938c). Orthogonal functions and tests of significance in the analysis of variance. 
Supp. J.RS.S., 5, 177. , 

Yates, F. (1939a). The recovery of inter-block information in variety trials arranged in three- 

; dimensional lattices. Ann. Hug. Lond., 9, 136. 

Yates, F., and Haus, R. W. (19395). The analysis of Latin squares when two or more rows, 
columns or treatments are missing. Supp. J.R.S.S., 6, 67. 

Yates, F, (1939c). The adjustment of the weights of compound index numbers based on inaccurate 
data. J.R.S.S., 102, 285. 

Yates, F. (1939d). Tests of significance of the differences between regression coefficients derived 
from two sets of correlated variates. Proc. Roy. Soc. Hdin., 59, 184. 

Yates, F. (1939). The comparative advantages of systematic and randomised arrangements in 
the design of agricultural and biological experiments. Biom., 30, 440. 

Yates, F. (1939f). An apparent inconsistency arising from tests of significance based on fiducial 
distributions of unknown parameters. Proc. Camb. Phil. Soc., 35, 579. 

Yates, F. (1940). The recovery of inter-block information in balanced incomplete block designs. 
Ann. Eug. Lond., 10, 317. 

Youne, A. W., and Pearson, K. (1916). On the probable error of a coefficient of contingency 
without approximation. Biom., 11, 215. (Correction, Biom., 12, 259.) 

Younea, L. C. (1941). On randomness in ordered sequences. Ann. Math. Stats., 12, 293. 

Yue, G. U. (1897a). On the significance of Bravais’ formule for regression, etc., in the case of 
skew correlation. Proc. Roy. Soc., A, 60, 477. 

Yue, G. U. (18976). On the theory of correlation. J.R.S.S., 60, 812. 

Yue, G. U. (1900). On the association of attributes ete. Phil. Trans., A, 194, 257. 

Yous, G. U. (1906). On a property which holds good for all groupings of a normal distribution, 
etc. Proc. Roy. Soc., A, 77, 324. 

Yue, G. U. (1907). On the theory of correlation for any number of variables treated by a new 
system of notation. Proc. Roy. Soc., A, 79, 182. 

Yuzu, G. U. (1910). On the interpretation of correlations between indices or ratios. J.R.S.S., 
73, 644. 

Yuuz, G. U. (1912). On the methods of measuring the association between two attributes. 
JH. 75, 519: 

Yue, G. U. (1921). On the time-correlation problem. J.R.S.S., 84, 497, 


BIBLIOGRAPHY 503 


Yue, G. U. (1922). On the application of the y? method to association and contingency tables, 
with experimental illustrations. J.R.S.S., 85, 95. 

Youre, G. U. (1926). Why do we sometimes get nonsense correlations between time-series, etc. ? 
J.85., 89, 1. 

Yours, G. U. (1927a). On a method of investigating periodicities in disturbed series, with special 
reference to Wolfer’s sunspot numbers. Phil. Trans., A, 226, 267. 

Yue, G. U. (19276). On reading a scale. J.R.S.S., 90, 570. 

Yuuez, G. U. (1936). On a parallelism between differential coefficients and regression coefficients. 
Joo, 99, 770. 

Ycue, G. U. (1938a). A test of Tippett’s random sampling numbers. J.R.S.S., 101, 167. 

Yu ts, G. U. (19385). On some properties of normal distributions, univariate and bivariate, based 
on sums of squares of frequencies. Biom., 30, 1. 


Zaycorr, R. (1936). Uber die Zerlegung statistischer Zeitreihen in drei Komponenten. Stat. Inst. 
Econ. Res. Univ. Sofia, No. 4. 

Zayoorr, R. (1937). Uberdie Ausschaltung der zufalligen Komponente nach der Variate-difference- 
Methode. Stat. Inst. Econ. Res. Univ. Sofia, No. 1. 

ZriaupD-Din, M. (1938). On differential operators developed by O’Toole. Ann. Math. Stats., 
9, 63. 

Zocu, R. T. (1934). Invariants and covariants of certain frequency curves. Ann. Math. Stats., 
5, 124. 

Zoou, R. T. (1935, 1937). On the postulate of the arithmetic mean. Ann. Math. Stats., 6, 171; 
and: Reply to Mr. Wertheimer’s paper. Jbid., 8, 117. 

ZRZAVY, F. J. (1933). Ausschaltung von Saisonschwankungen mittels Lag-correlation. Monatsber. 
der Oest. Inst. fiir Konjunkturforschung. Wien. 


INDEX 


(References are to pages. 


The abbreviations ‘“‘ N.R.” and “ Bibl.’ refer to the Notes 


and References and to the Bibliography respectively. Greek letters are indexed under 
their Roman equivalents, e.g. %? under Chi-squared and w under Omega.) 


Acceptance, region of, 63, 76. 

Accidents, see Industrial Accidents. 

Accuracy, of an estimator, 28-9; loss of, 30-2. 

of calculation, Bibl., Walker and Sanford 

(1934) 498. 

Addition of variate, in regression analysis, 167—70. 

Additive functions, Bzbl. : Erdés and Kac (1939), 
Erdés (1939), Erdés and Wintner (1939) 
459. 

Admissible functions, see Random Sequence. 

Adyanthaya, A. K., distribution of ¢ in non- 
normal case, 103. 

Age and audible pitch, (Example 22.4) 152-3, 
(Example 22.5) 155-6. 

Agricultural statistics, bibliography of, Bibdl., 
Wishart (1934a, 6) 501. 

Aitken, A. C., minimum variance, 51, (Exercises 
18.1 and 18.2) 61; N.&., 61, 173. 

Allan, F. E., orthogonal polynomials, 161, (Exer- 
cise 22.4) 173; N.R., 173, 245. 

Almost periodic functions, Bibl.: Besicovitch 
(1932) 446, Bohr (1925) 447, Hartman 
and others (1938) 467, Kerchner and 
Wintner (1936) 473, van Kampen (1939a) 
496, Wintner (19346) 500. 

Alter, D., N.R., 437. 

Amount of information, in estimation, 29-30. 

Analysis of variance, generally, 175-246; one- 
way classifications, 175-6; two-way classi- 
fications, 181-7; three-way classifications, 
187-8 ; interactions, 188-9; n-way classi- 
fications, 189-98; arithmetic of, 198-9; 
z-test in, 199; factorial experiments, 199- 
202; in non-normal data, 205-16; variate 
transformations, 206-9; randomisation, 
209-13 ; randomised blocks, 213-14; rank- 
ing tests, 214-15; estimation of class- 
differences, 218-19; different numbers in 
sub-classes, 220-8 ; factorial classifications, 
228-9; missing plot technique, 229-33 ; 
relation with regression analysis, 233-7 ; 
covariance analysis, 237—45. 

Bibl.: Bartlett (1936d,e) 445; Beall 
(1942) 446; Bliss (19388) 447; Brandt 
(1933) 449; Clark and Leonard (1939) 452 ; 
Cochran (1935, 19376, 19396, 1940b) 452; 
Comrie and others (1937) 452; Curtiss 
(1943) 454; Daniels (19386) 455; Fieller 
(1940) 460; Hendricks (1935) 468; P. L. 
Hsu (1940, 19416) 469; Irwin (1931, 1934, 


1942) 470; E. 8. Pearson (19316) 482; 
Roy (19396, 1942a, 6) 489; Schultz and 
Snedecor (1933) 490; Snedecor and Cox 
(1934a) 492; Snedecor (19346) 492; P. C. 
Tang (1938) 494; Wald (1940b) 497, 
(1941d) 498; Wilks (1932e, 1937b) 499, 
(1938e) 500; Yates (1938c) 502. 

See also Fisher’s Distribution, Replica- 
tion, Blocks, Design, etc. 

Analysis situs, Bibl., Hotelling (1927) 469. 

Ancillary estimators, 32-3. 

Anderson, O., variate-difference method, 391, 393. 
N.R., 394. 

Andersson, W., N.R., 172; (Exercise 22.5) 174. ° 

André, D., N.R., 136. 

Animal experiments, Bibl., Wishart (1939) 501. 

Association, Bibl.; 8S. S. Bose and Mahalanobis 
(1938a) 448, M. Greenwood and Yule (1915) 
466, K. Pearson and Heron (1913c) 484, 
K. Pearson (1913d) 484, Yule (1900, 1912) 
502. 

Asymmetrical frequency-distributions, Bzbl., Hans- 
mann (1934) 467. See also Gram-Charlier 
Series, Pearson Distributions. 

Asymptotic distributions, Bibl., Hartman and 
others (1939) 468, Haviland (1939) 468. 
See also Convergence in Probability. 

Attributes, significance in k samples, 119-20. 

, sub-sampling for, Bzbl., Bartlett (1937a) 445. 

Autocorrelation, see Serial Correlation, Correlo- 

gram. 

function, 421-3. 

Autoregression equations, 399; (Table 30.4) 401; 
406-8 ; period of, 414-21. See also Serial 
Correlation, Correlogram. 

Average, accuracy of, Bibl.: Bowley (1912) 448, 
Keynes (1911) 473. See also Mean, Median, 
Mode. 


Balance, in design, 263-5. Bibl.: R. C. Bose 
(1939) 448, R. C. Bose and Nair (1939) 448, 
R. C. Bose (1942a) 448, Cox (1940) 453, 
K. R. Nair and Rao (1942) 479, Neyman and _ 
Pearson (1938d) 480, HE. 8S. Pearson (19376, 
1938) 483, “Student ’’ (1938) 493, Weiss 
and Cox (1939) 498, Yates (1938a, 1940) 
502. 

Barbacki, S., N.R., 266. 

Barley yields, (Table 29.1, Figure 29.1) 364. 

Barnard, M. M., (Example 28.3) 345-8 ; N.R., 359. 


504 


INDEX 


Bartels, J., N.R., 437. 

Bartlett, M. 8., distribution of ¢, 103; conditional 
tests, 127; k samples, 299, 323; stabilising 
variance, 207-8; Wishart’s distribution, 
333. Exercises from: (21.7) 139, (21.10) 
139, (2011, 20-13, 21-14) 140; (27.2) 326, 
(28.2) 360, (28.12) 362. N.R., 45, 83, 94, 
136, 245, 304, 359, 437. 

Bayes’ theorem and postulate, in estimation, 58-9 ; 

in relation to fiducial inference, 90-1, 93-4. 
Bibl.: Bayes (1763) 446, Berkson (1930) 
446, Burnside (1924) 450, Molina (1931) 
478, E. 8. Pearson (1925) 482, K. Pearson 
(1920a) 485, von Mises (1938) 497, Wishart 
(1927) 500. 

Beall, G., N.R., 216. 

Behrens’ test, 82, 91-4, 111-12. See Two Samples. 

Belonging coefficient, Bibl., Kullback (1935c) 474. 

Bessel function distribution, (Exercise 28.2) 359— 
60. Bibl.: R. C. Bose (1938a) 448, S. 8. 
Bose (1938a) 448, Fieller (1932a) 460, 
McKay (1932) 477, K. Pearson (1933a) 486, 
K. Pearson and others (1932a) 486. 

Best critical regions, 272, 275-8. 

Beta (measure of skewness and kurtosis), Bidl., 
McKay (1938) 477. 

Beta-function, Bibl., Miller (1931) 479, Thompson 
and others (194la) 494. 

Beveridge, Sir William, (Table 30.1) 396, N.R., 
437. See Wheat-price Index. 

Bias, in estimation, 3-4; in statistical tests, 
307-27. Bibl.: Daly (1940) 454, Neyman 
and Pearson (1936, 1938) 480, Neyman 
(1935b) 480, Yates (1935a) 502. 

Bimodal distributions, transformations of, Bvz6l., 
Baker (1930a) 444. 

Binomial, confidence intervals for, (Example 19.2) 

66-9; tables of, 81. 

, generally, Bibl.: Ayyangar (1934) 444, 
Camp (1924) 450, Clopper and Pearson 
(1934) 452, Cochran (1936a, 1937a, 19406) 
452, Fisher (19416) 462, Greenwood and 
Yule (1920) 466, Kullback (19356) 474, 
Lurquin (1937) 476, K. Pearson (1915b) 
484, Romanovsky (1923) 489. 

Biological assays, Bibl., Irwin (19376) 470. 

Births, proportion of males in, (Example 21.8) 120. 

Biserial coefficients, Bibl.: Newbold (1925) 479, 
K. Pearson (1909, 1910) 484, (1917) 485, 
Soper (1914) 492. 

Bishop, D. J., N.R., 304, 359. 

Bivariate surfaces, Bzbl.: Narumi (1923a) 479, 
Nicholson (1943) 481, Pretorius (1930) 487, 
Rhodes (1923, 1925) 488, Ritchie-Scott 
(1921) 489, Villars and Anderson (1943) 
496. 

Blocks, randomised, 213-14. Bidl.: R. C. Bose 
(1939) 448, R. C. Bose and Nair (1939) 448, 


505 


R. C. Bose (1942a) 448, Cornish (1940a, b, c) 
453, Cox (1940) 453, Fisher (1940b, 1942a) 
462, Goulden (1937) 465, Kishen (1942) 
473, Nair and Rao (1942) 479, Nair (1943) 
479, Savur (1939) 490, Yates (19366, 1939a, 
1940) 502. 

Bose, C., N.R., 266. 

Bose, R. C., N.R., 359. 

Bowley, A. L., N.R., 266. 

Brady, J., N.R., 245. 

Brandt, A. E., (Example 24.1) 221-5, N.R., 245. 

Breeds of pig, (Example 24.1) 221-5, (Example 
24.2) 225, (Example 24.3) 226-7, (Example 
24.4) 229. 

Brookner, R. J., N.R., 304. 

Brown, G. W., bias in tests, 323, N.R., 304. 

Brown-Spearman formula, Bzibl., Wherry (1935) 
499, 

Bruns, H., N.R., 437. 

Brunt, D., rainfall data, (Table 29.4) 367, N.R., 
437. 

Burr, I. W., distribution functions, 440. 

Buys-Ballot table, 430. 


Calculating machines, Bibl. : Comrie (1936) 452, 
Hey (1938) 468, Mallock (1933) 477. 
Canonical correlations, 348-58. Bibl.: Bartlett 

(1941) 445, Hotelling (1936b) 469, P. L. 
Hsu (1941a) 469. See Multivariate Analy- 
sis. 
Carleman criterion, 440. 
Cauchy population, estimation of location, 2, 
(Example 18.2) 51; median in, (Example 
17.4) 6; approximation to estimator for, 
(Example 17.11) 23-4; loss of information, 
(Example 17.16) 32. 
Cave, B. M., N.R., 394. 
Cement, specification of, Bidl., 
500. 
Central confidence intervals, 66. 
limit theorem, zbl.: Bernstein (1927, 
1936) 446, Bochner (1936) 447, Feller 
(19366, 1937) 460, Gnedenko (1938) 465, 
Liapounoff (1900, 1901) 476, Lindeberg 
(1922) 476, Madow (1939) 476, Pélya (1920) 
487. See Convergence in Probability. 
Centre of location, 41. 
Chains, in probability, see Markoff Process. 
Characteristic equation, Bibl., Horst (1935) 469, 
Samuelson (1942) 490. 
functions, Bibl.: Boas and Smithies 
(1937) 447, Dugué (1939) 458, Glivenko 
(1936) 465, Haviland (19346, 1935) 468, 
Kullback (1934, 19366) 474, Kunetz (1936) 
474, Wintner (1936) 500. 
Charlier’s series, see Gram-Charlier Series. 
Chi-squared (y?), minimum, 55-8; in testing 
goodness of fit, 106-7; in testing hypo- 


Wilsdon (1934) 


OS 


506 


theses, 299, 302; generalisation in multi- 
variate analysis, see Wishart’s Distribution. 

Chi-squared, generally, Bibl. : Aroian (1943) 444, 
Berkson (1938) 446, Brownlee (1924a) 449, 
Camp (19386) 450, Cochran (1936a, 1942a) 

’ 452, Deming (1934, 1938) 456, Eisenhart 
(1938) 459, El Shanawany (1936) 459, Fisher 
(1922a, 1928c, 1924d) 461, Fry (1938) 464, 
Griineberg and Haldane (1937) 466, Gumbel 
(19436) 466, Haldane (1937, 1938, 1939, 
1940) 467, Hoel (1938) 468, Irwin (19296) 
470, Jeffreys (19386, 19396) 471, Johnson 
and Welch (1939) 471, Koshal (1939) 474, 
Mann and Wald (1942) 477, Merrington 
(1941) 478, Neyman and Pearson (193la) 
480, K. Pearson (1900c) 483, (1916e, f, 
1922a, 1923) 485, (19326) 486, Robinson 
(1933) 489, Seal (1940) 490, K. Smith (1916) 
492, Snedecor and Irwin (1933) 492, Su- 
khatme (1937a, 1938a) 494, C. M. Thompson 
(19416) 494, Wilson and Hilferty (1931a) 
500, Wilson and others (19316) 500, Yates 
(19346) 502, Yule (1922) 503. 

Clitic curve, 142. 

Clopper, C. J., confidence limits for a binomial, 81. 

Closeness, in estimation, Bibl., Geary (1944) 464. 

Closure, Bibl., Stekloff (1914) 492. 

Cochran, W. G., on Fisher’s distribution, 117, 199 ; 
elimination of variates, 170, (Example 
22.10) 171; theorem on sum of squares, 
177-8; N.R., 136, 216. 

Cograduation, Bibl., Gini (1939) 465, Salvemini 
(1939) 490. 

Combination of tests, 132-3. Bibl. : David (1934) 
455, E. S. Pearson (1938) 483, K. Pearson 
(1933b) 486, Wallis (1942) 498. 

of observations, Bibl.: Bruen (1938) 449, 
Brunt (1931) 449, Mather (1935) 477. See 
Errors, g*neral theory of. 

Compatible events, Bibl., Gumbel (19385) 466. 

Complete sufficiency, in estimation, 40. 

Complex experiments, Bibl., Yates (1935b) 502. 
See Design. 

Composite hypothesis, 269, 282-3, 287-92, 316-17. 

Compound frequency-distributions, Bibl., Hel- 
guero (1906) 468, K. Pearson (19156) 484. 
See Bimodal. 

Concentration, Bibl. : Castellano (1933a, b, 1937) 
451, Galvani (1932) 464, Gini (1932) 465, 
Pietra (1932a) 486, von Schelling (1934) 
497, Wold (1935) 501. 

Concordance, Bibl., Gini (1916) 465. 

Concordant samples, 128. 

Conditional statistics, (Exercise 21.10) 139; N.R., 

45. Bibl., Bartlett (1938b) 445. 

tests, 127-8, 134. 

Confidence, belt, 63; 
62-84 ; 


coefficient, 63; intervals, 
for one parameter, 62-5 ; central 


INDEX 


and non-central, 66-9; for large samples, 
69-71; shortest sets, 71-4; sufficient 
estimators, 74-5; for several parameters, 
76-9, 81-2; studentisation in determining, 
79-81; tables of, 81; limits, 63. 

Bibl. : Clopper and Pearson (1934) 452, 
David (1937, 1938@) 455, Frankel and Kull- 
back (1940) 463, Kolmogoroff (1941) 474, 
K. R. Nair (19406) 479, Neyman (19376, 
1941a) 480, E. S. Pearson (1932) 482, 
Pearson and Sukhatme (19356) 482, Ricker 
(1937) 488, W. R. Thompson (1936) 494, 
Wald and Wolfowitz (19396) 497, (194I1c) 
498, Wald (1942a) 498, Welch (1939a) 498, 
Wilks (19386, c) 499, (1939a@) 500, Wilks 
and Daly (1939b) 500. 

Configuration of sample, 127. 

Confluence analysis, Bibl.: Cobb (1939) 452, 
Frisch (19346) 464, Mendershausen (1939) 
478, Reiersél (1940, 1941) 488. 

Conformity, index of, Bibl., Solomon (1939) 492. 

Confounding, 262-3. Bibl.: Barnard (1936) 444, 
R. C. Bose and Kishen (1941) 448, Fisher 
(1942c) 462, K. R. Nair (19386, 1941) 479, 
Yates (1933a) 501. See Design. 

Consistence, of estimators, 3, 12—15. 

Contagious distributions, Bibl., Feller (1943) 460, 
Neyman (1939a) 480. 

Contingency, Bibl.: Bartlett (19356) 445, Blake- 
man and Pearson (1906) 447, Harris and 
Treloar (1927) 467, Hirschfeld (1935) 468, 
Kondo (1929) 474, K. Pearson and Blake- 
man (1906) 484, K. Pearson (1900a, b) 483, 
(1904) 484, (1916b) 485, Stevens (1938a) 
493, Weida (1934) 498, Wilks (1935a) 499, 
Yates (1934b) 502, Young and Pearson 
(1916) 502. 

Continuous spectrum, in penodeuees 433. 

Convergence in probability, Bibl. : Cantelli (1916, 
1917, 1923, 1933a, 1935) 450, Cramér (1934) 
454, Dodd (1926, 1927) 456, Doeblin (1938, 
1939) 457, Dugué (1937a) 458, Feller (1937) 
460, Fréchet (1930) 463, Jordan (1933) 472, 
Kolmogoroff (1937a) 473, Kozakiewicz 
(1937, 1938) 474, Lévy (1935, 1936c, 1939a) 
475, Messina (1933) 478, Romanovsky 
(19326) 489, Slutzky (1925, 1937a) 491. 
See also Central Limit Theorem. 

Convolutions, Bzbl., van Kampen (1937a) 496, van 
Kampen and Wintner (19376, c) 496. 

Cornish, E. A., on Fisher’s distribution, 
Neto mleGe 

Corrections, for grouping see Grouping Correc- 
tions ; to correlations, Bibl., Roff (1937) ee 

Correlated observations, sampling from, :zbl. : 
A. T. Craig (19330) 453, C. C. Craig (1931) 
453, (1932) 454, Rhodes (1927) 488. See 
also Time-series, 


116, 


INDEX 


Correlation, confidence intervals for coefficient, 
81; Pitman’s test for, 131-2; significance 
of, 235. 

Bibl. : Baker (19300) 444, Bilham (1926) 

1 447, Bispham (1920, 1923) 447, Bonferroni 
(1939) 447, Brander (1933) 449, W. Brown 
(1909) 449, Brownlee (1910, 1925) 449, 
Cheshire and others (1932) 451, Cochran 
(1937a) 452, Coleman (1932) 452, Cowles 
and Chapman (1935) 453, Day and Fisher 
(1937) 455, David (1937, 1938) 455, G. R. 
Davies (1930) 455, de Lury (1938) 456, 
Deming (1937) 456, Dieulefait (1934a, 
1935a) 456, S. C. Dodd (1937) 457, Dunlap 
(1931) 458, Eells (1929) 459, Ezekiel (1930a) 
459, Fischer (1933a, 6) 460, Fisher (1915, 
1918, 1921c, 1924a) 461, Fréchet (1933) 
463, Frisch (1929) 463, Frisch and Mudgett 
(1931) 463, Garwood (1933) 464, Geary 
(1927) 464, Gehlke and Biehl (1934) 464, 
Geiringer (1933) 464, Jeffreys (1939c) 471, 
Khintchine (1928) 473, Kuzmin (1939) 474, 
Lindblad (1937) 476, Merzrath (1933) 478, 
A. N. K. Nair (1942) 479, Newbold (1925) 
479, E. 8S. Pearson (1923, 1924, 193la, 1932) 
482, K. Pearson (1897b, 1900a, 6, 1902a) 
483, (1904, 1905, 1907a, 1909, 1910, 1913a, b, 
1914, 1921) 484, (1920b, 19255) 485, Pitman 
(1939c) ‘486, Prokopovic (1935) 487, Quensel 
(1938) 487, Rider (1932) 488, Romanovsky 
(1925a) 489, Soper (1913, 1914, 1917) 492, 
Steffensen (1934) 492, Stouffer (1934, 
1936a, 6) 493, ‘Student’ (1908b) 493, 
Thorndike (1937) 494, Thouless (1939) 495, 
Tschuprow (1925, 1928) 495, (1934) 496, 
Wicksell (1917a, b, 1921, 1933) 499, Yasu- 
kawa (1925) 501, Yule (1897a, 6, 1906, 
1907, 1910) 502. 

See also Multiple Correlation, Regression. 
ratio, Bibl.: Hotelling (1925) 469, Isserlis 
(1914, 1916) 470, Kelley (1935) 472, Mussel- 
man (1926) 479, E. S. Pearson (1927) 482, 
K. Pearson (1905, 1910, 191la, b, 1915a) 
484, (1917, 19236) 485, “ Student ” (1913) 
493, Wallis (1939) 498, Wishart (1932a) 500. 
Correlogram, 404-12; significance of, 412-13; of 
general linear series, 420-1; relation with 
periodogram, 432-3. 
Cost of living, Bibl. : Bennett (1920) 446, Bowley 
(1919) 448, Konos (1939) 474. 
Cotton yarn, Bibl., Tippett (1935) 495. 
Counting experiments, Bibl., Peierls (1935) 486, 
Tippett (1932) 495. 
Coutts, J. R. H., data from, (Table 22.1) 150. 
Covariance, analysis of, 237-45. Bibl.: Bailey 
(1931) 444, Bartlett (1935d, 1936c) 445, 
Brady (1935) 449, Cochran (1934) 452, 
Cornish (1940c) 453, Cox and Snedecor 


507 


(1936) 453, Hirschfeld (1937) 468, K. R. 
Nair (1940a) 479, Snedecor (1935) 492, Wilks 
(1936) 499, (1938c) 500, Wishart (1936) 501. 

Covariance, distribution of, (Example 28.1) 334-5. 

Cramér, H., w?-test, 108-9; Carleman criterion, 
440. 

Critical region, 270, (Example 27.2) 312-13. 

Crop estimation, Bibl., Yates (1936c) 502. 

Crum; W. I, N.&., 437. 

Cumulants, Bibl.: Ayyangar (1938) 444, Cornish 
and Fisher (1937) 453, C. C. Craig (1931c) 
454, Dressel (1940, 1941) 458, Frisch (1926) 
463, Gotaas (1936) 465, Thiele (1931) 494. 
See also k-statistics, Moments. 

Curtiss, J. H., N.B., 216. 

Curve fitting, Bibl.: Elderton and Hansmann 
(1934) 459, Fisher (1912) 461, Jones (1937a) 
472, Kerrich (1935) 473, Koshal (1933, 
1935, 1939) 474, Myers (1934) 479, Nair 
and Shrivastava (1942) 479, Nair and 
Banerjee (1943) 479, K. Pearson (1901c) 483, 
Rhodes (19380) 488, Roos (1937) 489, 
K. Smith (1916) 492, Snow (1911) 492, 
Wald (1940a) 497. See also Least Squares, 
Regression, Trend. 

Curvilinear regression, 145-74. Bibl., Menders- 
hausen (1937a) 477, T. V. Moore (1937) 
478; and see Regression. 

Cycle, 397-8. See Periodicity. 

Cyclical effects, tests for, 
Periodicity. 


124-7, 370. See 


D?-statistic, N.R., 359. Bibl. : Bhattacharya and 
Narayan (1942) 446, R. C. Bose (1936a, b) 
447, R. C. Bose and Roy (1938c, 1940) 
448, 8. N. Bose (1935, 1937) 448, Roy 
(1939a) 489. See also Discriminatory 
Analysis, Multivariate Analysis. 

Daly, J. F., on shortest confidence intervals, 82 ; 
on bias in tests, 323; N.R., 304. 

Daniels, H. E., (Example 23.2) 183-5; 
correlations, 441. 

Dantzig, G. B., N.R., 304. 

David, F. N., confidence intervals for correlations, 
HLS iN/oliy, SOeh. 

Davis, H. T., time-series, 433, 434; 
437. 

Day, E. E., N.R., 245. 

Death rates, Bibl., Farr (1919, 1920) 460, Pearson 
and Tocher (1916c) 485. 

Decomposition of series, Bibl., Anderson (1927) 
443, Smirnoff (1935) 491. See also Time- 
series. 

Decreasing functions, Bibl., C. D. Smith (1939) 
491, 

Degrees of freedom, of “‘ Student’s”’ ¢, 102; of 
hypotheses, 270. 

De Lury, D., N.&., 137. 


rank 


N.R., 394, 


508 


Denumerable probabilities, Bibl., Steinhaus (1923) 
492. 

Dependence, see Independence, Correlation. 

Derkson, J. B. D., on stochastic convergence, 440. 

Design, of sampling inquiries, 247-68; pre- 
liminary points, 248-9 ; stratified sampling, 
249-52; design of experiments, 252-4 ; 
orthogonality, 254; replication, 255; 
randomisation, 255-6; sensitivity of a 
test, 256-7; Latin squares, 257-62; con- 
founding, 262; design and randomisation, 
263 -6. 

Bibl. : Bhattacharya (1943) 446, Chris- 
tidis (1931) 451, Fisher (1935c) 462, Jeffreys 
(1939e) 471, “‘ Student ” (1938) 493, Wold 
(1943) 498, Yates (1939e) 502. See also 
Blocks, Factorial Experiments, Latin 
Squares, etc. 

Determinantal equations, Bzbl., Girshik (1939) 
465. See also Matrix. 

Deviance, footnote, 178. 

Difference, of two means, test of (equal variances) 
109-11; (unequal variances) 111-14. See 
also Behrens’ Test, Two Samples. 

, of two variances, 115-16. 

, equations, Bibl., Frisch 
Marples (1932) 477. See 
regression Hquations. 

Differences of variates, Bibl., Irwin (1937a) 470. 

Dilution method, Bibl., R. D. Gordon (1939) 465, 
Matuzewski and others (1935) 477. 

Dirichlet integrals, 298. 

Discontinuous variates, Bibl. : dell’ Agnola (1937) 
456; Guldberg (1934) 466, Muench (1938) 
478, H. W. Norton (1937) 481, Ottestad 
(1937, 1938) 481. 

Discordant samples, 128. 

Discriminatory analysis, discriminant function, 
341-8, Bibl. : Barnard (1935) 444, Bartlett 
(1939c) 445, Dwyer (1942) 458, Fisher 
(1936a, 1938c, 1939b, 1940d) 462, P. L. Hsu 
(19396, 194la, 1941c) 469, H. F. Smith 
(1936) 492, Travers (1939) 495, Wallace 
and Travers (1938) 498, Welch (1939b) 498, 
Wilks (1938d) 500. See also Multivariate 
Analysis. 

Dispersion, Bibl., Norris (1938) 481. 

etc. 

matrix, 330, 341, N.R., 358. 

Dissection of frequency-distributions, Bibl., Burrau 
(1934) 450. 

Distributed lags, see Lags. 

Distributions, generally, Bibl.: Ambarzumian 
(1937) 443, Baten (1933a) 445, (1934) 446, 
Bispham (1922) 447, Bochner and Jessen 
(1934) 447, Bochner (1937) 447, Bowley 
(1933) 448, Burr (1942) 450, Camp (1937) 
450, Cannon and Wintner (1935) 450, 


(1932) 463, 
also Auto- 


See Variance, 


INDEX 


Chapelin (1932) 451, Cramér and Wold 
(1936) 454, Edgett (1931) 458, Eyraud 
(1938a) 459, Glivenko (1933) 465, Guldberg 
(1935) 466, Hansmann (1934) 467, Hartman 
and others (1937) 467, (1939) 468, Haviland 
(1934a, 6, 1935, 1939) 468, R. Henderson 
(1907) 468, Jessen and Wintner (1935) 
471, Khintchine (1937a) 473, Kullback 
(1936b) 474, Mazzoni (1934) 477, K. Pearson 
(1923c, 19240) 485, R. Schmidt (1934) 490, 
von Mises (1939a) 497. 

Dodd, E. L., period generated by moving average, 
384, N.R., 394. 

Doob, J., N.R., 45. 

Dosage-mortality, Bibl., Garwood (1941) 464. 

-response, Bibl., Irwin and Cheeseman (1939) 

470. 

Dugué, D., N.R., 45. © 

Duration of play, Bibl., de Finetti (19396) 456, 
Fieller (193la) 460. 


Eden, T., on Fisher’s distribution, 206, (Example 
23.8) 214, N.R., 216. 

Edgeworth, F. Y., N.&., 45. 

Edwards, J., Integral Calculus, footnotes, 44 and 
50. 

Efficiency, of estimators, 5-7; of maximum 
likelihood estimators, 18-19; of moments 
in fitting Pearson curves, 43-4 ; of sampling, 
Bibl., Yates and Zacopanay (1935c) 502. 

Egg-production, in laying hens, (Table 29.5, 
Figure 29.5) 368. 

Egyptian skulls, (Example 28.3) 345-8. 

Elasticity of demand, Bzibl., Mosak (1939) 478, 
Schultz (1933) 490. 

Elderton, E. M., (Example 21.14) 133, N.R., 266. 

Elderton, Sir William P., N.R., 45. 

Electric lamps, testing of, (Example 23.1) 179-80. 

Elimination of variates, in regression analysis, 
167-70. 

Enumeration in sampling, Bibl., Cochran (1939b) 
452. 

Equidetectability, curves of, 318. 

Equimoda! distributions, Bibl., Mouzon (1930) 478. 

Error, in variance-analysis, 187. 

Errors, of first and second kind, 270, (Exercise 
26.5) 305. 

——, general theory of, Bibl.: Brelot (1936, 
1937) 449, Campbell (1935) 450, Cramér 
(1928) 454, Deming and Birge (1934) 456, 
Edgeworth (1905, 1906) 458, Jeffreys (1933, 
1937c, 1938d, 1939d) 471, Mahalanobis 
(1922) 476, Wertheimer (1932) 499. See 
also Least Squares. 

Estimation, generally, 1-49, 50-62; in analysis 
of variance, 181, 218—19. 

Estimator, definition, 2; consistence of, 3; bias 
of, 3-4; efficiency of, 5-10; sufficiency of, 


INDEX 


7-12; approximation to, 22-4; most 
general sufficient form, 24-5; accuracy of, 
28-9; ancillary, 32-3; in multivariate 
case, 33-42; location and scale, 40-2; 
by minimum variance, 50-5; by minimum 
z*, 55-8; by inverse probability, 58-9 ; 
by least squares, 59-60. See also Maxi- 
moum Likelihood, Minimum Variance. 
Bibl.: Aitken and Silverstone (1942) 

443, Beall (1939) 446, S. S. Bose and 
Mahalanobis (19386) 448, Darmois (1935, 
1936) 455, O. L. Davies and Pearson (1934) 
455, Doob (1936) 457, Dugué (1936a, b, 
19376) 458, Fisher (19256) 461, (1934d, 
1938b, d) 462, Geary (1942, 1944) 464, 
Halphen (1939) 467, Neyman (19376) 480, 
EK. 8. Pearson (1937a, 1939) 483, Pitman 
(19376, 1939a) 486, Wald (1939a) 497, 

Expectation of life, see Life. 

Expected values, see Mean Values. 

case, in sociological data, Bzbl., Stouffer and 

Tibbits (1933) 493. 

Expenditure of families, (Example 23.9) 214-15. 

Exponential distribution, (Exercise 26.8) 305-6. 
Bibl., Paulson (1941) 482, Sukhatme (1936b) 
493. 

Extra-sensory perception, Bibl., Greenwood and 
Stuart (1937) 465, Stevens (19396) 493. 

Extremes, distribution of, Bibl.: Daniels (1941) 
455, de Finetti (1932) 455, Dodd (1923) 
456, Fisher and Tippett (1928a) 461, 
Gumbel (19384, 1935a) 466, McKay (1935) 
477, Olds (1935) 481, Tippett (1925) 495. 
See also mth Values. 


F-distribution (variance ratio), Bibl., Merrington 
and Thompson (1943) 478. See Fisher’s 
Distribution. 

Factor analysis (psychology), Bibl.: Bartlett 
(1937e) 445, W. Brown (1935) 449, Burt 
(19374, b, 1938a, b) 450, Camp (1932, 1934) 
450, Darmois (1934) 455, Emmett (1936) 
459, Hoel (1937, 1939) 468, Irwin (1933) 
470, Ledermann (1938) 475, Roff (1937) 
489, Thomson (1916, 19196, 1939) 494, 
Thurstone (1935, 1938) 495. 

Factorial experiments, 199-202. Bibl.: Barnard 
(1936) 444, R. C. Bose and Kishen (1941) 
448, Cornish (1936, 19400, c) 453, Goulden 
(1937, 1938) 465, P. L. Hsu (1943) 470, 
Kishen (1940) 473, Wishart (1938) 6501, 
Yates (19376) 502. 

—— moments, Bibl., Gonin (1936) 465, Ottestad 

(1939) 481. 

sums, in fitting regressions, (Example 22.8) 

164-5. 

Factorisation of variables, Bzbl., 8. C. Dodd (1927) 
457. 


509 


Families of alternatives, 275-6. 

Feller, W., N.R., 303. 

Fiducial inference, 85-95. Bibl. : Bartlett (1939a) 
445, Fisher (1933, 1935a, 19356, 1936c, 
19376, 1939a, 1940c, 1941a) 462; Garwood 
(1936) 464, Ricker (1937) 488, Segal (1938) 
491, Wilks (19386, c) 499, (1939a, b) 500. 
See Confidence intervals. 

Field experiments, Bibl., Wishart and Saunders 


(1935) 501. See Design. 
Fifteen-constant surface, B2bl., K. Pearson (1925a) 
485. 


Inlory, Il, ING (Eh, AY /R, TS. 

Finite populations, sampling from, Bibl. : Church 
(1926) 452, Hansen and Hurwitz (1940) 
467, Irwin and Kendall (1944) 470, Isserlis 
(1918c, 1931) 470, Neyman (1925) 480, 
O’Toole (1934) 481, Sukhatme (1944) 494, 
Tschuprow (1918b, 1921, 1923) 495. 

Finney, D. J., z-test, 199; test of significance in 
periodogram analysis, 434; N.#., 137, 216. 

Fisher, R. A., fitting by moments, 43; fiducial 
probability, 90; tables for Behrens’ test, 
92, 93, 111; expansion of “ Student’s ” 
integral, 101; tables of ¢, 102; difference 
of two means, 110; 2z-distribution, 116, 
117; configuration of a sample, 127; 
fitting regressions, 165; theorem on sum 
of squares, 176-7; design of experiments, 
263; discriminatory analysis (Example 
28.2) 342-4; distribution of canonical 
correlations, 357 ; significance of a periodo- 
gram, 434; N.R., 45, 61, 83, 94, 136, 173, 
216, 245, 266, 359. 

Exercises from: (Exercise 17.1) 465, 
(Exercises 17.4, 17.5, 17.6) 46, (Exercise 
17.12, 17.15, 17.16) 48, (Exercise 17.19) 49, 
(Exercise 18.3) 61, (Exercises 20.1, 20.2) 
94-5, 

Fisher’s distribution (z-distribution), properties of, 
116-18; in variance analysis, 179, 199; 
in non-normal case, 205-6, 234-6, (Example 
26.8) 289-91; in linear hypothesis, 301 ; 
in discriminatory analysis, 345. 

Bibl. : Aroian (1941) 444, R. A. Chap- 
man (1938) 451, Cochran (1940a) 4852, 
Daniels (1938a) 454, Eden and Yates (1933) 
458, Fisher (1924c) 461, P. L. Hsu (1941c) 
469, Lawley (1938) 475, McCarthy (1939) 
477, Paulson (1942) 482, Welch (1937) 498. 

Fitting, see Curve Fitting, Least Squares. 

Flood flows, Bibl., Gumbel (1938a, 1941) 466. 

Fluctuations in time-series, Bibl., R. A. Gordon 
(1937) 465. See Time-series. 

Forecasting, Bibl. : Cowles (1933) 453, Cowles and 
Jones (1937) 453, de Finetti (1937) 456, 
Schultz (1930) 490, Yates (1986c) 502. 

Forsyth, A. R., Calculus of Variations, footnote, 50. 


510 


Fourier analysis, see Harmonic Analysis, Period- 
icity. 

Fragmentary samples, Bibl., Wilks (1932a) 499. 

Frankel, L. R., N.&., 136, 266. 

Freedom, degrees of, see Degrees of Freedom. 

Frequency-distributions, see Distributions. 

Frequency theory of probability, Bibl. ; Campbell 
(1939) 450, Cantelli (1923, 1932, 19330) 450, 
(1936) 451, Dérge (1934, 1936) 458, von 
Mises (1931) 497. See Probability, Random 
Sequence. 

Friedman, M., (Example 23.9) 214-15. 

Frisch, R., N.#., 358. 


Galton’s problem, Bibl. ; Galton (1902) 464, Irwin 
(1925a) 470, K. Pearson (1902c) 484. See 
Rank Correlation. 

Gamma distribution, Bzbl., Kibble (1941) 473. 
See Type III. 

Garwood, F., confidence intervals for Poisson dis- 
tribution, 81. 

Gauss, K. F., variance of residuals, 60-1; stan- 
dard errors, 153; N.&., 45. 

Gaussian distribution, see Normal Population. 

Geary, R. C., distribution of ¢, 102-4; test of 
normality, 106; theorem on independence, 
118; (Exercises 21.1, 21.2) 187-8; N.R., 
45, 136. 

Geary’s ratio, Bibl., Geary (1935a, b, 1936a) 464, 
Tricomi (1937) 495. 

General factor (intelligence), see Factor Analysis. 

Generalised distance, of Mahalanobis, N.R., 359. 

Generating functions, Bibl., Aitken (1931) 442. 
See Characteristic Functions. 

Geometric Mean, Bibl., Camp (1938a) 450, Norris 
(1938, 1940) 481. 

Germination of wheat-seeds, (Example 23.7) 207-9. 

Gini’s mean difference, 108. 

Girshik, M. R., (Exercise 28.11) 362, N.R., 359. 

Glass, seed in, (Example 23.6) 202-4. 

Goodness of fit, tests of, 106-9. Bibl. : David 
(1939) 455, Neyman (1937a) 480, K. Pear- 
son (1934) 486, Thomson (1919a) 494. See 
Chi-squared. 

Gosset, W. S. (‘‘ Student ’’), 80, 266, N.R., 394. 

Gould, C. E., (Example 23.6) 202-4. 

Goulden, C. H., N.R., 216, 266. 

Grades, see Rank Correlation, Galton’s Problem. 

Graduation, Bibl., Aitken (19338a, b, c) 442, Key- 
fitz (1938) 473. See Interpolation, Least 
Squares, Orthogonal Polynomials, Trend. 

Graeco-Latin square, 261-2. Bibl., R. C. Bose 
(1938b) 448. 

Gram-Charlier series, estimation in (Exercise 18.1) 
61; for non-normal ¢, 103; goodness of 
fit in, 109; in z-distribution, 116. Bzbl. : 
Aitken and Oppenheim (1931) 442, Aitken 
(1932) 442, Aroian (1937) 444, Baker 


INDEX 


(1930d, 1935) 444, Charlier (1906, 1912, 
1928, 1931) 451, Cornish and Fisher (1937) 
453, C. C. Craig (1931b) 454, Cramér (1926, 
1935b) 454, Doetsch (1934) 457, Edgeworth 
(1905) 458, Gram (1879) 465, Hildebrandt 
(1931) 468, Jacob (1933, 1935, 1937) 471, 
Meisener (1938) 477, Quensel (1938) 487, 
Samuelson (1943) 490, Schmidt (1934) 490, 
Steffensen (1930) 492, Wicksell (1917), 
1934a) 499. 

Greenstein, B., N.R., 437. 

Grouping corrections, Bibl.: Abernethy (1933) 
442, Alter (1939) 443, Baten (1931) 445, 
Blame! (1939) 447, Burkhardt and Stackel- 
berg (1939) 449, Carver (1933, 1936) 451, 
C. C. Craig (1936c, 19416) 454, Elderton 
(1933, 19386) 459, Kendall (1938a) 472, 
Lewis (1935) 475, Sandon (1924) 490. 

, effect on correlations, Bibl., Gehlke and 
Biehl (1934) 464. 

» Significance of, Bzbl., Stevens (19376) 493. 

Groups of experiments, Bibl., Yates and Cochran 

(19385) 502. 


Hampton, W. M., (Example 23.6) 202-4. 

Hansmann, G. H., N.R., 45. ; 

Harmonic analysis, Bzbl. ; T. F. Anderson (1935) 
443, Brunt (1928) 449, Carslaw (1930) 451, 
Fisher (1929a) 461, (1940a) 462, Frisch 
(1928, 1931, 1933) 463, Pollak (1926) 487, 
Turner (1913) 496, Wiener (1930) 499. 
See Periodicity. 

— mean, Bibl., Norris (1939) 481. 
Values. 

Hartley, H. O., on z-test, 199; k samples, 299; 
N.#., 137, 216, 304. 

Heads and tails, Bibl., Fieller (1931c) 460. See 
Duration of Play. 

Hendricks, W. A., (Exercise 21.9) 1389; N.R., 186. 

Hermite polynomials, see Tchebycheff-Hermite 
Polynomials. 

Heterogeneous populations, Bibl., Baker (1930c, 
1932) 444. See also Lexis Theory, Strati- 
fied Sampling. 

Hierarchies in correlation, Bibl., Thomson (1916, 
19196, 1935) 494, Wilson (1928) 500. See 
Factor Analysis. 

Higham, J. A., (Exercise 29.7) 395. 

Highest audible pitch, (Example 22.4) 152-3, 
(Example 22.5) 155-6. 

Hirschfeld, H. O., see Hartley, H. O. 

Homogeneity, Bibl.: Baker (1941) 444, Hartley 
(1940) 467, Welch (1938a) 498. See k 
samples. 

Horse population and wheat prices, 436. 

Hotelling, H., canonical correlations, 
(Exercises 28.7—28.10) 360-2; 
136, 359. 


See Mean 


348-58 ; 
N.R., 465, 


INDEX 


Hotelling’s 7, 323, 335-8; N.R., 359. Bidl., 
Hotelling (1931) 469, P. L. Hsu (1938c) 469. 

Hsu, P. L., linear hypothesis, 301; Wishart’s 
distribution, 333; canonical correlations, 
357; N.B:, 304, 359. 

Hypergeometric series, Bibl.: Ayyangar (1934) 
444, Camp (1925a) 450, O. L. Davies (1933, 
1934) 455, Gonin (1936) 465, K. Pearson 
(1899b) 488, (19246, c) 485, Romanovsky 
(1925b) 489. 

Hypotheses, testing of, see Statistical Hypotheses. 


Imaginary random variables, Bibl., Eyraud (1938b) 
459. 

Immunity, Bibl., Brownlee (1905) 449. 

Incomes, distribution of, Bzbl., Cantelli (1929) 
450, Darmois (1933) 455. 

Incomplete blocks, see Blocks. 

Independence, of quadratic forms, Bibl. : Cochran 
(1934) 452, A. T. Craig (1936a, 1943) 453, 
Madow (1940) 476. : 

, statistical, Bzbl.: del Vecchio (1933) 456, 
Kac and van Kampen (1939) 472, Marcin- 
kiewicz and Zygmund (1937) 477, Tschu- 
prow (1934) 496. See also Correlation, 
Contingency, etc. 

Index, distribution of, see Ratio. 

numbers, Bibl.: Bowley (1926) 448, Clare- 

mont (1916) 452, Crowther (1934) 454, 
Dodd (1937c) 457, Edgeworth (1925a, b, c) 
459, I. Fisher (1922) 460, Fiux (1921, 1933) 
463, Frickey (1937) 463, Frisch (1930) 463, 
Haberler (1927) 467, Konés (1939) 474, 
Persons (1928) 486, Rhodes (1936) 488, 
Schultz (1939) 490, Yates (1939c) 502. 

Indices, correlation of, Bibl.: Baker (1937) 444, 
J. W. Brown and others (1914) 449, Ciare- 
mont (1916) 452. 

Industrial accidents, Bibl., Newbold (1927) 479. 

processes, see Quality Control. 

Inequalities, Bibl.: Mortara (1934) 478, Narumi 
(1923b) 479, Norris (1935, 1937) 481, 
Romanovsky (1938) 489, Shohat (1929) 
491, C. D. Smith (1930) 491, von Mises 
(1939b) 497, Wald (1938) 497. 

Infantile mortality, Bibl., Feld (1924) 460. 

Infection in potatoes, (Example 24.5) 230-2, 
(Example 24.6) 232-3. 

Inference, see Statistical Hypotheses. 

Information, amount of, 29-30; loss of, 30-2; 
in minimum y?, 57-8. Bibl.: Bartlett 
(1936a, 6) 445, Fisher (19346, 1935a) 462. 

Intensity, of a periodogram, 425. 

Interaction, in variance-analysis, 187, 188-9. 

Interference, analysis of, Bzbl., Stevens (1936) 493. 

Interpolation, Bibl.: Comrie (1936) 452, Erdés 
and Turan (1937, 1938) 459, Feldheim 
(1936a) 460, Fisher and Wishart (1927) 


511 


461, Gini (1921) 465, Lidstone (1937) 476, 
Pietra (19326) 486, Salvemini (1934) 490, 
Simaika (1942) 491, Tchebycheff (1907) 
494. See also Graduation, Least Squares, 
Orthogonal Polynomials. 

Intra-class correlation, 181, Bibl. Harris (1914) 
467, Harris and Gunstad (1931) 467. 

Intrinsic accuracy, in estimation, 28-9. 

Invariants of frequency curves, Bibl., 
(1934) 503. 

Inverse probability, in estimation, 58-9 ; relation- 
ship with fiducial inference, 90-1, 93-4. 
Bibl.: Bayes (1763) 446, Fisher (1926c, 
1930a) 461, (1932, 1935a) 462, Isserlis (1936) 
471, Jeffreys (1937b) 471, Tornier (1937) 
495, Wisniewski (19376) 501. 

Iris (flower), (Example 28.2) 342-4, 

Irregular Kollektiv, 123. See Random Sequence. 

Irwin, J. O., (Exercise 23.1) 216-17; sampling 
moments, 440; N.R., 216. 

Item analysis, Bibl., Merril (1937) 478. 

Iterations, see Runs. 


Zoch 


J-shaped distributions, Bzbl., 
459, Solomon (1939) 492. 

Jackson, W. R., N.R., 304. 

Jeffreys, H., (Example 18.5) 56-7; fiducial 
inference, 90-1, 938-4; N.R., 61, 94, 266. 

Jensen, A., N.R., 266. 

Joint sufficiency, 39. 

Judgments, validity of, Bibl., Eysenck (1939) 459. 


Elderton (1933) 


k samples, problem of, 119-22, 295-9; bias in, 
323, (Exercise 27.2) 326. Bibl.: Bartlett 
(1934a) 445, Bishop (1939) 447, Bishop and 
Nair (1939) 447, R. C. Bose and Roy (1940) 
448, G. W. Brown (1939) 449, Neyman 
and Pearson (19316) 480, Pearson and 
Wilks (19336) 482, Sukhatme (19366) 493, 
(19376) 494, Welch (1935) 498, Wilks 
(19356) 499. See L-tests. 

k-statistics, Bibl. ; Fisher (19296) 461, Fisher and 
Wishart (1931) 462, C. T. Hsu and Lawley 
(1939) 469, Kendall (1940) 472, (19426) 473, 
Wishart (1929a, b, 1930, 1933b) 500. See 
also Moments, sampling. 

Kelley, T. L., (Example 28.4) 351-2. 

Kermack, W. O., N.R., 136. 

Keynes, Lord, (Exercise 17.7) 47. 

Kolmogoroff, A., confidence intervals for ter- 
minals, 83. 

Kolodzieezyk, St., linear hypothesis, 293; N.R., 


304. 
Koopman, B. O., (Exercises 17.13, 17.14) 48, 
IN Rion, GED, 


Koshal, R., N.R., 45.. 
Kronecker delta, 329. 


512 


Kurtie curve, 142. 
Kurtosis, Bibl., Frisch (1934a) 464. 


L-tests, Bibl. : . Mahalanobis (1933) 476, Mood 
(1939) 478, Nayer (1936) 479, Paulson 
(1941) 482, Welch (1936a) 498, Wilks and 
Thompson (1937a) 499. See k samples. 

Lag correlation, 435-6. 

Lags, distributed, Bibl.: Alt (1942) 443, Koop- 
mans (1941) 474, K. R. Nair (1936) 479, 
Zrzavy (1933) 503. 

Lanarkshire milk investigation, N.R., 266. 

Large numbers, law of, see Convergence in Proba- 
bility. 

Largest member of a sample, see Extremes. 

of a set of variances, see Variance ratio. 

Latent roots of a matrix, see Matrix. 

Latin squares, 257-62, 266. Bzrbl.: R..C. Bose 
(19386) 448, R. C. Bose and Nair (19426) 
448, Euler (1782) 459, Fisher and Yates 
(1934c) 462, Fisher (1942d,e) 462, Mann 
(1943) 477, H. Norton (1939) 481, Stevens 
(19386) 493, Welch (1937) 498, Yates (1933c) 
501, (1936a) 502. 

Lattices, distributions on, 
Wintner (1939b) 496. 

Lawley, D. N., N.R., 359. . 

Least squares, in estimation, 59; in regression 
analysis, 145; in time-series, 371. Bvzbdl.: 
Adcock (1878) 442, Aitken (1933a, 6, c, 
1935a@) 442-3, Davis (1933) 455, David and 
Neyman (1938c) 455, Deming (1931, 1934, 
1935, 1937) 456, Hendricks (1931, 1934) 
468, H. Johnson (1940) 471, Jones (1937a) 
472, Jordan (1932, 1934) 472, Kerrich (1937) 
473, Sheffer (1935) 491, Sheppard (1914, 
1929) 491, Sterne (1934) 493, Wisniewski 
(19372) 501, Wong (1935) 501. 

Lexis, W., ratio, 119; N.R., 216. 

theory, Bibl. : Geiringer (1942) 465, Rider 

(1934) 488, Tschuprow (1918, 1919a) 495, 
von Bortkiewicz (1931) 497. 

Life, expectation of, etc., Bibl.: Brownlee and 
Morison (1911) 449, Dublin and others 
(1935) 458, Greenwood (1922) 466, Gumbel 
(1924, 1925, 1932) 466, Seal (1940) 490, 
Wilson (1938) 500. 

Likelihood, in estimation, see Maximum Likeli- 
hood ; in testing hypotheses, 277-80, 295- 
302, 323-6. Brbl., Fisher (1932, 1934a, b) 
462, Wilks (1935a) 499. 

Likelihood-ratio tests, Bibl.: Daly (1940) 454, 
Neyman and Pearson (1933c) 480, Wilks 
(1938a) 499, Wilks and Thompson (1937a) 
499. See L-tests. 

Limiting form of significance tests, 322. Bibl., 
Peiser (1943) 486. 


van Kampen and 


INDEX 


Linear equations subject to error, Bibl., Lonseth 
(1942) 476. 

— hypotheses, 292-5, 300-2. Bzbl., Johnson 
and Neyman (1936) 472, Kolodzieczyk 
(1935) 474. 

Linearity of regression, see Regression. 

Linkage, Bibl., Finney (1940, 1941, 1942) 469, 
N. L. Johnson (19406) 472. 

Link-relatives, Bibl., Robb (1930) 489. See Index 
Numbers. 

Live births, proportion of males among, (Example 
21.8) 120. 

Location, estimation of parameters of, 40-2; 
centre of, 41; Pitman’s tests of, 323-6. 
Bibl., Pitman (1939a, 6) 486. 

Logarithmic variate, Bibl.: Finney (19416) 460, 
Jenkins (1932) 471, Nydell (1919) 481, 
Pae-Tsi-Yuan (1933) 481, Quensel (1936) 
487, Wicksell (1917a) 499, Williams (1937) 
500. 

Loss of information, in estimation, 30-2. 

weight in soil, (Example 22.3) 149-52, 

(Example 22.6) 158. 


m rankings, problem of, (Example 23.9) 214-15. 
Bibl., Friedman (1937, 1940) 463, Kendall 
and Babington Smith (1939b) 472. 

Macaulay, F. R., (Exercise 29.4) 395; N.R., 394. 

MacStewart, W.,.N.R., 304. 

Madow, W. G., N.R., 359. 

Magnetic declination, Bibl., Schuster (1899) 490. 
Magnitude, random division of, Bibl., Fisher 
(1940a) 462, Stevens (1939a) 493. 

Mahalanobis, P. C., N.R., 303, 304, 359. 

Males, proportion in births, (Example 21.8) 120; 
marriages of, (Example 21.9) 121-2. 
Markoff, A. A., theorem on least squares, (Exercise 

25.5) 267. 
process (Markoff chains), B2bl.: Doeblin 
(1936, 1937) 457, Elfving (1937, 1938) 459, 
Feldheim (19366) 460, Fortet (1935-8) 463, 
Fréchet (1935, 19366, 1937a) 463, Geiringer 
(1938) 464, Hadamard and Fréchet (1933) 
467, Hostinsky (1937) 469, Kolmogoroff 
(19376) 473, Lévy (19356, 1936c) 475, 
Markoff (1912) 477, Mihoe (1934) 478, 
Onicescu and Mihoc (1935-9) 481, Roman- 
ovsky (1936a) 489, Séukarev (1932) 490. 
Marriage, males according to age at, (Example 
21.9) 121-2. 
rate in England and Wales, (Table 30.2) 397, 
(Example 30.3, Table 30.5, Figure 30.4) 
408-9. 
Martin, BE. 8., N.R., 359. 
Mass production, see Quality Control. 
Matching problems, Bibl.: Battin (1942) 446, 
D. W. Chapman (1935) 451, J. A. Green- 
wood (1938) 465, (1940) 466, Greville (1938, 


INDEX 


1941) 466, Olds (1938a) 481, Vernon (1936) 
496, Wilks (1932c) 499. 

Mathematical Tripos, distribution of women 
obtaining firsts in, (Example 18.5) 56-7. 

Matrix, arithmetic of, Aitken (1937a, b, 1938) 443, 
Bingham (1941) 447, Dwyer (194la, b) 458, 
Hotelling (1943) 469. 

Maximum likelihood estimators, 12-49; con- 
sistence, 138-15 ; normality, 15-17 ; variance 
of, 17-18 ; efficiency of, 18-19; sufficiency, 
19-20; for several parameters, 34-49; 
variance and covariance of, 36-7; relation 
with minimum variance, 53, and with con- 
fidence intervals, 73-4. 

Bibl. : Carlson (1932) 451, Fisher (1912, 
192la, 1925b, 1928c) 461, (1932, 1934a) 
462, Hotelling (1930) 469, Jeffreys (1938), 
1938c) 471, Koshal (1933, 1935, 1939) 474, 
Myers (1934) 479, E. S. Pearson (1937a) 
483, K. Pearson (1936) 486, Welch (1939c) 
499, 
McKendrick, A. G., N.R., 136. 
Mean, arithmetic, estimation of, 2; (Example 
17.6) sufficient estimator for, 11 ; (Example 
17.7) 19-20; most general distribution for 
which it is estimator (Example 17.10) 22; 
Significance of, 98-100, (Examples 27.1, 
27.2) 311-12. 
deviation, in testing normality (Geary’s 
ratio), 106; distribution of m.d., Bibl. : 
Fisher (1920) 461, Fréchet (1936a) 463, 
Tricomi (1936b, 1937) 495. 
—— difference, 108. Bibl. : Cantelli (1913) 450, 
de Finetti and Paciello (19306) 455, de 
Finetti (1931) 455, U. S. Nair (1936) 479, 
Wold (1935) 501. 
values, Bibl. : Aumann (1934-5) 444, Bunak 
(1936) 449, A. T. Craig (19366) 453, Dodd 
(1934, 1937a, 6, c, 1938) 457, Doodson (1917) 
458, Dressel (1941) 458, Norris (1935, 1937) 
481, Wertheimer (1937) 499, Yasukawa 
(1925) 501, Zoch (1935, 1937) 503. 
Means, distribution of, Bibl. : Baker (1930d, 1931, 
1932, 1936, 1940) 444, Behrens (1929) 446, 
R. C. Bose (1938a) 448, Carlson (1932) 451, 
Cochran (1937a) 452, A. T. Craig (1932) 
453, Dodd (1926-7) 456, Dunlap (1931) 458, 
Hall (19276) 467, Holzinger and Church 
(1929) 469, Irwin (1927, 1929, a, 1930) 470, 
Immer (1937) 470, Isserlis (1918a) 470, 
Jeffreys (1940) 471, Kolmogoroff (1929) 473, 
Pizzetti (1939) 487, Pollard (1934) 487, 
Rhodes (1927) 488, Romanovsky (1929) 
489, Simon (1943) 491, Truksa (1940) 495. 
See also Central Limit Theorem, Mean 
Values. 
, test of difference, see Difference ; in multi- 
variate analysis, 338-41. 
A.S.—VOL. II. 


513 


Mean-square contingency, see Contingency. 

successive difference, Bibl. : Hart (1942) 
467, von Neumann and others (194la, b) 
497, J. D. Williams (1941) 500. 

Median, as estimator, 5; confidence intervals for, 
(Exercise 19.5) 84. Bzbl.: Cisbani (1938) 
452, Doodson (1917) 458, Gini and Galvani 
(1929) 465, Gini (1938) 465, Gini and 
Zappa (1938) 465, Gulotta (1938) 466, 
Haldane (19426) 467, Hojo (1931, 1933) 
469, Jackson (1921) 471, K. R. Nair (19406) 
479, K. Pearson (19316) 486, Pollard (1934) 
487, Savur (1937a) 490, W. R. Thompson 
(1936) 494, Ville (1936c) 496. 

Migration, see Random Migration. 

Minimum variance, of maximum likelihood esti- 

mators, 18-19; in estimation, 50-5. 

47, In estimation, 55-8. 

Missing plot technique, 229-33. Bibl.: Allan 

and Wishart (1930) 443, Cornish (1940a, b) 

453, K. R. Nair (1940a) 479, Yates (1933b) 

501, Yates and Hale (19396) 502. 

Bibl.: Doodson (1917) 458, Haldane 
(19426) 467, K. Pearson (1902b) 484, 
Yasukawa (1926) 501. 

Moment-function, Bibl., U. S. Nair (1939) 479. 
See Characteristic Functions, Generating 
Functions. 

Moments, efficiency of, 43-4. 

of distributions (specification), Bibl. : Corn- 

ish and Fisher (1937) 453, Fisher (1937a) 

462, R. Henderson (1907) 468, O’Toole 
(1933) 481, Pearl (1937) 482, K. Pearson 
(1936) 486, Romanovsky (19365) 489, von 
Mises (1937) 497. See Curve Fitting. 
——, problem of, Bibl.: Bodewadt (1936) 447, 

Broggi (1934) 449, Chlodovsky (1938) 451, 
Hamburger (1920, 1921) 467, Haussdorf 
(1923) 468, Haviland (1935, 1936) 468, 
Marecinkiewicz (1939) 477, Polya (1920, 
1938a) 487, Stekloff (1914) 492, Stieltjes 
(1918) 493, Widder (1934) 499. 

, sampling, Bibl.: Bernstein (1932) 446, 
C. C. Craig (1928) 453, (1940) 454, Dwyer 
(1937a, 1938, 1940) 458, Fisher (19296) 
461, Fisher and Wishart (1931) 462, Geary 
(1933) 464, Irwin and Kendall (1944) 470, 
Isserlis (19186, c, 1931) 470, St. Georgescu 
(1932) 493, Sukhatme (1938c, 1944) 494, 
Tschuprow (19186, 1921, 1923) 495, Wilks 
(1934, 1936) 499, Wishart (1929a, b, 1930, 
1931a, b, 1933b) 500, Wishart and Bartlett 
(19326) 500, Ziaud-din (1938) 503. See 
also k-statistics. 

Monotonic functions, in distribution theory, Bidl., 

Bochner (1937) 447. 
Mood, A. M., N.R., 304. 
Moore, G., phases in time-series, 126; N.R., 136 
LL 


Mode, 


514 


Morant, G., N.R., 394. 

Morgan, W. A., N.R., 137. 

Mortality, see Life. 

Most-efficient estimator, 6, 10, 18-19. 

Most-selective confidence intervals, 75, 82. 

Moths, effect of weather on, (Example 22.10) 
171-2. 

Moving averages, 372-87, 399. Bzrbl.: Dodd 
(1939a, 1941a, b) 457, Frisch (1938) 464, 
Wold (19386) 501. 

mth values, Bibl., Gumbel (1934, 1935a, 1939) 
466. 

Multinomial distribution, Bibl., Kullback (1937) 
474, Lurquin (1937) 476. 

Multiple correlation, Bibl.: Bacon (1938) 444, 

R. C. Bose (1934) 447, Fisher (19286) 461, 

Hall (1927a) 467, Kelley and McNemar 

(1929) 472, Kullback (1936c) 474, K. Pear- 
son and Lee (1908) 484, K. Pearson (1916d) 
485, K. Pearson and Young (1918) 485, 
Soper (1929a) 492, Starkey (1939) 492, 
Tappan (1927) 494, Wilks (19326) 499, 

Wishart (19316) 500, Wong (1937) 501. 

curvilinear regression, 167, 236. See Re- 

gression. 

—— happenings, Bzbl., Greenwood and Yule 
(1920) 466, K. Pearson (19126, 1913) 484. 
See Poisson Distribution, Pélya Distribu- 
tion. 

Multivariate analysis, 328-62; Wishart’s distri- 
bution, 330-4; Hotelling’s distribution, 
335-8; significance of set of means, 338— 
41; discriminatory analysis,. 341-8; 
canonical correlations, 348-58. 

Bibl. : Bartlett (19396, 1941) 445, Bishop 
(1939) 447, Fisher (1936a, b, 1938c, 19396, 
1940d) 462, Hotelling (1933, 1936a, b) 469, 
P. L. Hsu (19396, 1941a, c,d) 469, Madow 
(1937, 1938) 476, Mahalanobis (1930, 1936a) 
476, Mahalanobis and others (1936b) 476, 
Martin (1936) 477, Rider (1936) 488, Roy 
(1938, 1939a, b, 1942a,b) 489, Simonsen 
(1937) 491, Wald and Brookner (19416) 
498. 

—— distributions, estimation in, 33-7; normal, 
see Normal. #Brbl.; Leser (1942) 475, 
Lukomski (1939) 476, Mahlmann (1935) 
477. See also Multiple Correlation. 

Myers, R. J., N.R., 45. 


Nair, K. R., confidence intervals for median, 81, 
INGA Soe 

Nayer, P. N., testing hypotheses, 299; N.R., 304. 

Negative binomial, Bibl., Fisher (1941b) 462, 
Greenwood and Yule (1920) 466. See Pélya 
Distribution. 

Neyman, J., confidence intervals, 75-6; Behrens’ 
test, 93; randomised blocks, 214; theory 


INDEX 


of tests, 270, 299, 308, 311, 323; Exercises 
from: (Exercises 19.2, 19.3) 83, (Exercise 
21.12) 140, (Exercises 26.2, 26.3) 304, 
(Exercises 26.4, 26.5) 305, (Exercise 27.3) 
327. N.R., 45, 83, 94, 136, 172, 266, 303, 
304, 326. 

Nisbet, S. D., (Example 25.1) 258-9. 

Non-central confidence intervals, 66. 

t, Bibl., N. L. Johnson and Welch 
(1940a) 471. 

Non-normal data, in variance-analysis, 205-15. 

populations, Bzbl.: Baker (1934) 444, 
Bartlett (1935a) 445, C. C. Craig (1941a) 
454, Geary (1936b) 464, Laderman (1939) 
474, A. N. K. Nair (1942) 479, Pearson and 
Adyanthaya (1928, 1929) 482, E. S. Pearson 
(1931b) 482, Rider (1931a) 487, Rietz (1932, 
1939) 488, Thorndike (1937) 494. 

Non-orthogonal data, Bibl.: K. R. Nair (1942) 
479, Wilks (1938e) 500, Yates (1934a) 501. 

Non-parametric tests, 322. Buibl., Scheffé (1943) 
490. 

Non-random samples, Bibl., ‘‘ Student” (1909) 
493. 

Nonsense correlations, Bibl., Yule (1926) 503. 

Normal equations, solution of, Bibl., Hoel (1941) 

468. 

population, estimation of mean, 2, (Example 

17.6) 11, (Example 17.7) 19-20, (Example 
18.1) 51 ; estimation of variance, (Example 
17.6) 11, (Example 18.4) 54-5; centre of 
location of, (Example 17.22) 42 ; confidence 
intervals for mean, (Example 19.1) 63-4, 
(Example 19.3) 70; fiducial distribution, 
85; bivariate, (Example 17.17) 33-4, 
(Example 17.18) 37-8; regressions of, 
(Example 22.1) 144. 

Bibl.: Baker (1931) 444, Bergstrém 
(1918) 446, Cramér (1923, 1936) 454, Erdés 
and Kac (1939) 459, Haldane (1942a, b) 
467, C. T. Hsu (1940, 1941) 469, Isserlis 
(19185) 470, Kac (1939) 472, Khintchine 
(1935) 473, Kullback (1935a) 474, Leder- 
mann (1939) 475, Lehmann (1939) 475, 
Lengyel (1939) 475, K. Pearson (1924c) 485, 
Polya (1923) 487, Raikov (1938) 487, 
Rhodes (1928) 488, Tricomi (1935, 1936a, 
19366) 495, Yule (19385) 503. 

Normalisation of frequency functions, Bébl. : 
Cornish and Fisher (1937) 453, Haldane 
(1938) 467, Mahalanobis and others (19365) 
476, Paulson (1942) 482. 

Normality, tests of, 105-6. Bibl. : Fisher (1930b) 
461, Geary (1935a, 6b, 1936a) 464, Geary 
and Pearson (1938) 464, E. S. Pearson 
(1930, 1935c) 482, Yasukawa (1934) 501. 

Nuisance parameters, 134. Bibl., Hotelling (1940) 
469. 


INDEX 


Olds, EH. G., N.R., 266. 
Omega, for testing goodness of fit, 107-9. Bzbl., 
Smirnoff (1936) 491. 
One-sided confidence intervals, 76. 
Oppenheim, S., N.R., 437. 
Order, in random series, 122-4, and see Random 
Order. 
Orthogonal data, in variance-analysis, 219, 254. 
polynomials, 146-54, 159-67. Bzrbl.: Aitken 
(1932, 19330, b,c) 442, Allan (1930) 443, 
Dieulefait (19346) 456, Fisher (19216, 19245) 
461, Greenleaf (1932) 465, Jackson (1934, 
1937, 1938) 471, Jordan (1932) 472, Lidstone 
(1933) 476, Romanovsky (1927) 489, San- 
sone (1933) 490, Shohat (1935) 491, C. D. 
Smith (1939) 491, Tartler (1935) 494, 
Tchebycheff (1907) 494, Webster (1938) 
498, Wishart (1933a) 500, Wong (1935) 501. 
transformations, Bibl., Landahl (1938) 474, 
Ledermann (1938) 475. 
Oscillations, in time-series, 369, 370, 380, 397-8. 
See Periodicity. 


p-statistics, Bibl., Roy (1939b, 1942a) 489. See 
Multivariate Analysis. 

P,, test, see Combination of Tests. 

Paired comparisons, Bibl., Kendall and Babington 
Smith (1940) 472. 

Parameters, estimation of, see Estimation. 

“of location and scale, 40-2. 

Partial correlations, Bibl.: Isserlis (1914, 1916) 
470, Stouffer (1934) 493, Subramanian 
(1935) 493. 

Pasteurised milk, in feeding, (Example 21.14) 133. 

Path coefficients, Bibl., Engelhart (1936) 459, 
Wright (1934) 501. 

Paulson, E. A., z-distribution, 118 and N.R., 136. 

Peaks, in time-series, 124. 

Pearson distributions, moments in fitting, 43-4 ; 
sufficient estimators in (Exercise 17.18) 49. 
Bibl.: Ambarzumian (1937) 443, Baker 
(1940) 444, Beale (1937) 446, C. C. Craig 
(193''b) 454, Dieulefait (19356) 456, Fisher 
(1921a) 461, Hildebrandt (1931) 468, Irwin 
(1930) 470, K. Pearson (1894, 1895, 19016) 
483, (1916a) 484, (1924a) 485, Romanovsky 
(1924) 489, Wishart (1926) 500. See also 
Type I, etc. 

Pearson, E. S., confidence intervals for binomial, 
81; # in non-normal case, 103; test of 
normality, 106; 2 in non-normal case, 
205; (Exercise 23.4) 216-17; analysis of 
covariance, 238 ; (Exercises 26.2, 26.3, 26.4, 
26.6) 304-5; N.R., 45, 83, 136, 137, 245, 
266, 303, 304, 359. 

——, K., (Example 21.14) 133; N.&#., 45, 137, 
172, 173, 394. 


515 


Peas, yields of, (Example 23.5) 200-2. 

Periodicity and periodogram analysis, 423-5, 
432-3, 433-5. Bibl.: Alter (1924, 1925, 
1926a, b, 1933, 1937) 443, Beveridge (1921, 
1922) 446, Bradley and Crum (1939) 449, 
Brownlee (19246) 449, Bruns (1921) 449, 
Brunt (1925, 1928) 449, Buys-Ballot (1847) 
450, J. I. Craig (1916) 454, Crum (1923, 
1925) 454, Dodd (19380) 456, (1939a, b, 
194la, 6) 457, Frisch (1928, 1931, 1933) 
463, Greenstein (1935) 465, Hersch (1934) 
468, Kalecki (1935) 472, Koopmans (1940) 
474, Kuznets (1929, 1933) 474, Larmor and 
Yamaga (1917) 475, Mitchell (1913) 478, 
Mitchell and Burns (1935) 478, Moore (1914, 
1923) 478, Moulton (1938) 478, Oppenheim 
(1909) 481, Pietra (1925) 486, Pollak (1927) 
487, Pollak and Kaiser (1935) 487, Powell 
(1930) 487, Savur (1941) 490, Schuster 
(1898, 1899, 1906) 490, Soper (19296) 492, 
Starkey (1939) 492, Stumpff (1926, 1937) 
493, Tinbergen (1937, 1938) 495, Tintner 
(1935) 495, Trachtenberg (1921) 495, Vinci 
(1934) 496, Walker (1914, 1925, 1927, 1931) 
498, Wallis and Moore (1941) 498, Yule 
(1927a) 503: See also Harmonic Analysis, 
Time-series. 

Phases, in time-series, 124, 125-6. 

Pilot sampling, 252, N.R., 266. 

Pitman, E. J. G., tests of significance, 128-32, 


136; z-test, 211; tests of hypotheses, 
323-6; Exercises from, (Exercises 17.9, 
17.10, 17.11) 47, (Exercise 21.3) 138, 


(Exercise 21.15) 140, (Exercise 27.2) 326. 
N.R., 45, 137, 216. 

Plant breeding, Bibl., Y. Tang (1938) 494. 

Plot arrangements, Bibl., Tedin (1931) 494. See 
Design. 

Poisson distribution, (Example 17.9) 21-2; con- 
fidence intervals for, (Example 19.4) 70-1, 
81; conditional test for, (Example 21.12) 
127; in variance-analysis, 206-7. 

Bibl.: Ackermann (1939) 442, R. A. 
Chapman (1938) 451, Cochran (1936a, 
1940b) 452, Copeland and Regan (1936) 453, 
Doetsch (1934) 457, Fisher and others 
(1922c) 461, Garwood (1936) 464, Irwin 
(1935, 1937a) 470, Lévy (1937a) 475, Liiders 
(1934) 476, Molina (1942) 478, Poisson (1837) 
487, Przyborowski and Wilénski (1940) 487, 
Raikov (1936) 487, Ricker (1937) 488, 
Satterthwaite (1943) 490, “ Student ”’ (1907, 
1919) 493, Sukhatme (1937b, 19380) 494, 
von Bortkiewicz (1898, 1910) 496, Weida 
(1935) 498, Whitaker (1914) 499. 

Poisson’s theorem, in probability, Bibl., Bochner 
(1936) 447, Bonferroni (1933) 447. See 
Central Limit Theorem. 

Tas 


516 


Pélya distribution, Bibl., del Chiaro (1936) 456, 
S. Guldberg (1935) 466. See Negative 
Binomial. 

Polychoric correlations, B2bl., Pearson and Pearson 
(19226) 485, Ritchie-Scott (1918) 489. 

Polynomials, expansions in, Bibl. Cacciopolli 
(1932) 450, Davis (1933) 455. See Ortho- 
gonal Polynomials, Curve Fitting. 

Population of England and Wales, (Example 

92.7) 161-3, (Examples 22.8. 22.9) 164-7, 
(Table 29.2, Figure 29.2) 365. 

analysis, Bibl.: Lotka (1938, 1939) 476, 
Pearl and Reed (1923) 482, Volterra (1936) 
496. 

Potato yields, (Example 21.11) 126. 

Power of a test, 272, 307-8. Bibl. : G. W. Brown 
(1939) 449, Dantzig (1940) 455, Eisenhart 
(1938) 459, MacStewart (1941) 476, Simaika 
(1941) 491, P. L. Hsu (19416) 469, P. C. 


Tang (1938) 494. See also Statistical 
Hypotheses. 

Powers of normal variates, Bibl., Haldane (1942a) 
467. 


Prediction, see Forecasting. 

Pretorius, 8S. J.. N.&., 173. 

Principal components, Bibl. : Girshik (1936) 465, 
Hotelling (1933, 1936a) 469, Landahl (1938) 
474, Ledermann (1938) 475, Thurstone 
(1935) 495. 

Probabi! ty, Bibl.: Bartlett (1933b) 445, Beck 
(1936) 446, Belardinelli (1934) 446, Borel 
(1939) 447, Broderick (1937) 449, Cantelli 
(1932, 19336) 450, Castelnuovo (1932) 451, 
Cramér (1937, 1938, 1939) 454, de Finetti 
(1933a, 6, 1939a) 456, Doeblin (1938) 457, 
Doob (19346, 1941) 457, Eggenberger (1924) 
459, Erdélyi (1937) 459, Khintchine (19376) 
473, Kolmogoroff (1931, 1933a) 473, Lévy 
(193la, 193lc, 1936a, 1937a, 1938a) 475, 
Lomnicki (1923) 476, Marchand (1937) 477, 
McKinsey (1939) 477, Moisseiev (1937) 478, 
Nagel (1936) 479, Reichenbach (1937) 488, 
Rice (1938) 488, Romanovsky (193la) 489, 
Tornier (1929, 1930, 1936, 1937) 495, von 
Mises (1919a, 6, 1928, 1931, 1936a, b, 1939c, 
1941) 497, Urban (1918) 496, Uspensky 
(1937) 496. 

Probits, Bibl., Bliss (1935, 1937) 447. 

Product, distribution of, Bibl., C. C. Craig (1936a) 
454. 

Product-moment correlation, see Correlation. 

Proficiency test of recruits, (Example 24.7) 240-2. 

Proportionate frequencies, in variate-analysis, 228. 

Proportions, tests of, Bibl., Swaroop (1938) 494, 


Quadratic forms, see Independence of Quadratic 
Forms. 
Quality control, B2bl.: Becker and others (1930) 


INDEX 


446, Jennett and Welch (1939) 471, E. 8. 
Pearson (1933a, 1934) 482, Shewhart (1931) 
491, Simon (1941) 491, Welch (19366) 498, 
Wilks (1941) 500, Wolfowitz (1943) 501. 
Quartiles, Bzbl., Hojo (1931, 1933) 469. 
Quasi-Latin squares, Bibl., Yates (1937a) 502. 
Quasi-sufficiency, Bzbl., Bartlett (1940) 445. 
Conditional Statistics. 


See 


Racial likeness, N.R., 358. 
478, K. Pearson (19266) 485. 
variate Analysis. 

Rainfall in London, (Table 29.4, Figure 29.4) 367. 

Random component in time-series, 369; effect of 
trend-elimination on, 378-87; tests for, 
399. 

—— migration, Bibl., Brownlee (1911) 449. 

—— occurrences, Bibl., Morant (1921) 478. 

— order, tests of, 122-7. Bibl.: (runs, ete.) 

André (1884) 444, Besson (1920) 446, Borel 
(1933) 447, Denk (1936) 456, Fisher (19266) 
461, Gumbel (19438a) 466, Jones (1937c) 
472, Kaucky (1936) 472, Mood (1940) 478, 
von Bortkiewicz (1915a, 1917) 496, von 
Mises (1921) 497, Wolfowitz (1943) 501. 

- paths, Bibl., McCrea (1936) 477, Polya 

(19386) 487. 

—— samples, tables of, Bibl., Mahalanobis and 
others (1934) 476. 

—— sampling numbers, ibl.: Kendall and 
Babington Smith (1939a) 472, K. R.° Nair 
(1938a) 479, Yule (1938a) 503. 

—— sequence, Bibl.: Copeland (1928, 1929, 
1932, 1936, 1937) 453, Dérge (1934, 1936) 
458, Greville (1939) 466, Regan (1936, 
1938) 487, Rice (1939) 488, Swed and 
Eisenhart (1943) 494, Ville (1936a, 6) 496, 
von Mises (1931, 1933) 497, Wald (1936b, 
1937) 497, Young (1941) 502. 

—— variables, Bibl. : Cramér (1935a) 454, Cramér 
and others (1938) 454, de Finetti (1929) 
455, Eyraud (19386) 459, Lévy (1934, 
1935a, 6, 1936c, 1939a, b) 475. See Proba- 
bility. 

Randomisation, and z-test, 209-13, 255-6; in 
design, 263-6. Bibl., E. 8. Pearson (19376, 
1938) 483; and see Design. 

Randomised blocks, 213-14. Bibl.: Cornish 
(1940a) 453, McCarthy (1939) 477, Welch 
(1937) 498. See Blocks. 

Randomness, Bibl.: Borel (1937) 447, Dodd 
(1942) 457, Kendall (1941) 472, Kermack 
and McKendrick (1936, 1937) 473, Wiener 
(1938) 499. 

Range, test of, (Exercise 27.3) 327. Bibl. : Geary 
(1943) 464, Hartley (1942) 467, McKay and 
Pearson (1933) 477, Newman (1939) 480, 
Olds (1935) 481, E. 8S. Pearson (1926, 1932) 


Bibl., Morant (1939) 
See Multi- 


INDEX 


482, Pearson and Haines (1935a) 482, 
Pearson and Hartley (1942, 1943) 483, 
Romanovsky (19336) 489, W. R. Thompson 
(1938) 494, Tippett (1925) 495. 

Rank correlation, 123, 441. Bibl. : Daniels (1944) 
455, Dantzig (1939) 455, Dubois (1939) 458, 
Hotelling and Pabst (1936c) 469, Kendall 
(19386, 1942a) 472, Kendall and others 
(1939, 19396) 472, Olds (19386) 481, K. 
Pearson (1914, 1921) 484, Pearson and 
Pearson (193lc, 1932) 486, ‘* Student ” 
(1921) 493, Wallis (1939) 498, Watkins 
(1933) 498, Woodbury (1940) 501. 

Ratio, distribution of, Bibl. : C. C. Craig (19296) 
453, Curtiss (1941) 454, Fieller (19326) 460, 
Geary (1930) 464, Gordon (1941) 465, 
Hirschfeld (1937) 468, Kullback (1936a) 
474, Nicholson (1941) 481, van Uven (1932, 
1939) 496. 

Rectangular distribution, estimation of extremes, 
(Example 17.15) 28; intrinsic accuracy, 
(Example 17.11) 47 ; estimation by sample- 
centre, (Exercise 17.16) 48; confidence 
intervals for range, (Exercise 19.1) 83. 
Bibl.: O. L. Davies (1932) 455, Dunlap 
(1931) 458, Hall (19276) 467, Olds (1935) 
481, Rietz (193la) 488. 

Region of acceptance, 63, 76, 270. 

Regression, Gauss’ theorem on residuals, 60-1; 
generally, 141-74; analytical theory, 
141-5; fitting of curvilinear regressions, 
145-53 ; standard errors and tests of sig- 
nificance, 153-8; equal steps of variate, 
159-67 ; multiple curvilinear, 167; addi- 
tion of new variates, 167-72; in analysis 
of variance, 233-6 ; relation with Hotelling’s 
T, 336-7 ; in discriminatory analysis, 344-5. 

Bibl.: R. G. D. Allen (1939) 443, H. V. 

Allen (1938) 443, Andersson (1932) 443, 
(1934) 444, Bartlett (1933a, 1938c) 445, F. 
Bernstein (1937) 446, Blakeman (1905) 447, 
S. 8. Bose (1934a, 6, 19386) 448, Camp 
(19256) 450, Cochran (1938a) 452, Dodd 
(19376, c) 457, Dwyer (19376, 1941c) 458, 
Eisenhart (1939) 459, Ezekiel (19306) 460, 
Fisher (19226) 461, Galton (1886) 464, 
Jones (19376) 472, Koopmans (1937) 474, 
Mendershausen (1937a) 477, T. V. Moore 
(1937) 478, Neyman (1926) 480, K. Pearson 
(1896) 483, (1921, 1926a) 485, Quensel 
(1936) 487, Richards (1931) 488, Roman- 
ovsky (1926, 19316) 489, Slutzky (1914) 
491, K. Smith (1918) 492, Waugh (1942) 
498, Welch (1935) 498, Wicksell (19346) 
499, Yates (1939d) 502, Yule (1936) 503. 

coefficients, standard error of, 153-6; exact 

tests of, 156-8. 

Regular unbiassed critical regions, 318-19. 


- Rejection of observations, Bzbl. : 


517 


Irwin (19256) 
470, Pearson and Chandra Sekhar (1930) 
483, Rider (1933) 488, W. R. Thompson 
(1935) 494. 

Relaxed oscillations, Bibl., Le Corbeiller (1933) 
475, van der Pol (1930) 496. 

Reliability coefficients, Bibl., Stouffer (1936b) 493. 

Replication, 255. Bibl.: Bartlett (1938a) 445, 
Cochran (19376, 19386, 1939a) 452, Yates 
(1933a, 6) 500, (1936d) 501. See Design. 

Representative method of sampling, Bibl.: A. T. 
Craig (1939) 453, Jensen (1925) 471, Ney- 
man (19336, 1934) 480, Sukhatme (1935) 
493, 

Residual, in variance-analysis, 178, 185-7. 

Ricker, W. E., confidence intervals for Poisson 
distribution, 81. 

Riemann zeta-function, Bibl., Jessen and Wintner 
(1935) 471. 

Risk, theory of, Bibl., Cramér (1923) 454, Esscher 
(1932) 459. 

Robinson, G., N.R., 394, 437. 

Roots of equations, distribution of, Bibl., Girshik 
(1939, 1942) 465. 

Routine analysis, Bibl.: Neyman (1939b, 19416) 
480, Przyborowski and Wilénski (1935d) 
487, “Student ’’ (1927) 493. 

Roy, 8. N., distribution of canonical correlations, 
357 and N.R., 359. 

Runs, in time-series, see Random Order. 


Sampling distributions, moments of, see k-statistics, 
Moments. 

inquiries, see Design. 

» miscellaneous, Bibl.: Bartky (1943) 4465, 
Bartlett (19376) 445, Baten (19336) 446, 
Bowley (1925) 448, Burks (1933) 450, Clap- 
ham (1931, 1936) 452, Cochran (193868, 
19396, 1942b) 452, A. T. Craig (1933a, 6) 
453, C. C. Craig (193la) 453, Crum (1933) 
454, David (19386) 455, Hey (1938) 468, 
Hilton (1924, 1928) 468, Kiser (1934) 473, 
McKay (1934) 477, Neyman (1933a, 1934, 
1938a) 480, Olds (1939, 1940) 481, Panse 
(1939) 482, E. 8, Pearson (1933a, 1934) 
482, Pepper (1929) 486, Rhodes (1925) 488, 
Rider (19316) 488, Rietz (1937) 488, Shew- 
hart and Winters (1928) 491, ‘‘ Sophister ”’ 
(1928) 492. 

surveys, Bibl., A. N. Bose (1941) 447, C. 
Bose (1943) 447; and see Sampling, miscel- 
laneous. 

Sasuly, M., N.R., 394. 

Savur, 8. R., N.R#., 83. 

Scale, estimation of parameters of, 40-2; elimina- 

tion of parameters of, 79-80; Pitman’s 
tests of, 323-6. Bzbl., Pitman (1939a, 6) 
486. 


518 


Scale, reading, Bibl., Yule (19276) 503. 

Seales of measurement, Bibl., Cochran (19438) 452. 

Scatterance, N.R., 358. 

Scedastic curve, 142. 

Scheffé, H., non-parametric tests, 322 ; 
304, 326. 

Schoolchildren, tests of, (Example 25.1) 258-9, 
(Example 28.4) 351-2. 

Schultz, H., N.R., 394. 

Schuster, Sir Arthur, significance of periodogram, 
434; N.R., 437. 

Seasonal effect, in time-series, 369. Bzibl.: Bow- 
ley and Smith (1924) 448, Carmichael (1931) 
451, Carver (1932) 451, Crum (1925) 454, 
Detroit Edison Co. (1930) 456, Donner 
(1928) 457, Falkner (1924) 460, Gressens 
(1925) 466, Mendershausen (19376) 478, 
Robb (1929, 1930) 489, Wald (1936a) 497, 
Wisniewski (1934) 501, Zrzavy (1933) 503. 

Second Limit Theorem, Bibl., Fréchet and Shohat 

(1931) 463. 

moment, see Variance. 

Seed in optical glass, (Example 23.6) 202-5. 

Seeds of wheat, germination of, (Example 23.7) 
207-9. 

Selective confidence intervals, 75-6. 

Semi-normal distribution, Bzbl., Steffensen (1937) 
492. 

Seminvariants, see Cumulants, k-statistics. 

Sensitivity, of tests of significance, 256. 

Serial correlation, 402-4. See Correlogram. Bibl. : 
R. L. Anderson (1942) 443, Bartlett (1935c) 
445, Dixon (1944) 456, Kendall (1944a, b) 
473, Koopmans (1942) 474, Marples (1932) 
477, Schumann and Hofmeyer (1942) 490, 
Yule (1921) 502, (1926, 1927a) 503. 

Sheep population of England and Wales, (Table 
29.3, Figure 29.3) 366, (Example 29.5) 
385-6, (Example 30.5) 411, (Example 30.8) 
416-18. 

Sheppard’s corrections, see Grouping Corrcetions. 

Shortest confidence intervals, 71-5, 75-6. 

Significance tests, 96-140, 269-327. See Statistical 
Hypotheses. Bibl., Jeffreys (1938a) 471, 
Peiser (1943) 486. bs 

Silverstone, H., minimum variance, 61; 
cises 18.1, 18.2) 61. 

Simaika, J., N.R., 304, 359. 

Similar regions, 283. Bibl., Feller (1938) 460. 

Simon, L. E., N.R., 61. 

Simple hypotheses, 269, 272-82, 317-26. 

Simultaneous estimation, of several parameters, 
34-44. 

—— fiducial distributions, Bibl., Bartlett (1939a) 
445, 

Sinusoidal limit, N.#., 394. Bibl. : Marsueguerra 
(1936) 477, Romanovsky (193lc, 1932a, 
1933a) 489, Slutzky (19376) 491. 


Nib EG es 


(Exer- 


INDEX 


Skewness, Bzbl., Frisch (1934a) 464, Garner (1932) 
464. 

Skulls (Egyptian), (Example 28.3) 345-8. 

Slutzky, E., N.R., 394, 399. 

Slutzky-Yule effect, 378-87, 399. Bibl., Slutzky 
(19376) 491, Yule (1921) 502. 

Small numbers, law of, see Poisson Distribution. 

Smirnoff, N., w?-test, 109. 

Smith, H. Fairfield, N.R., 359. 

, K., minimum-y?, 55 and N.R., 61. 

Smoothing, see Moving Averages, Trend. 

Soil, loss of weight in, (Example 22.3) 149-52, 
(Example 22.6) 158. 

Solomon, L., footnote, 51. 

Spearman, C., (Exercise 25.3) 267. 

Spearman’s factor theory, see Factor Analysis. 

p, test of, 132. 

Speed tests in children, (Example 28.4) 351-2. 

Spelling ability in children (Example 25.1) 258-9. 

Spencer’s formula in curve fitting, (Examples 29.2, 
29.3) 376-7, 378-80, (Exercise 29.3) 394-5, 
(Example 30.2) 405. 

Spurious correlation, Bibl.: K. Pearson (18976) 
483, Spearman (1907, 1910) 492, Wicksell 
(1921) 499. 

Square of a variate, Bibl., Haldane (1941) 467. 

Squariance, footnote 178. 

Stabilising of variance, 207. 

Stability of series, see Lexis Theory. 

Stable laws of probability, Bibl. : Bochner (1937) 
447, Feldheim (1937a) 460, Khintchine and 
Lévy (1936) 473, Khintchine (1938) 473. 

Standard deviation, estimation of, (Example 17.5) 
6-7, (Example 17.6) 11, 52. See Variance. 

—— errors, in testing significance, 97-8; of 

regression coefficients, 153-6. Bibl. : Derk- 

son (1939) 456, Edgeworth (1908, 1909) 

459, Eels (1929) 459, Hendricks (1934) 468, 

Isserlis (1915, 1916) 470, Miller (1934) 478, 

K. Pearson (1903, 1913, 1920) 484, (1924d) 

485, K. Pearson and Lee (1908) 484, K. 

Pearson and Filon (1898) 483. 

Latin squares, 259. 

Stationary time-series, 396. Bibl.: Khintchine 
(1932, 1933, 1934) 473, Slutzky (1934) 491, 
Wold (1938a, 1939) 501. See Time-series, 
Correlogram. 

Statistical hypotheses, definition, 269; errors of 
first and second kind, 270-2; power 
function, 272; simple hypotheses, 272-5; 
best critical regions, 277-80; relation with 
sufficient estimators, 281-2; composite 
hypotheses, 282-3 ; similar regions, 283-7 ; 
of several degrees of freedom, 287; linear 
hypotheses, 292-5; likelihood criteria, 
295; k samples, 295-302; bias, 307-26; 
regions of Type A, 309-14, of Type A,, 
314-16, of Type B, 316-17, of Type C, 


INDEX 


317-22 ; limiting properties, 322; Pitman’s 
tests, 323-6. 

Bibl. : G. W. Brown (1940) 449, Chandra 
Sekhar and Francis (1941) 451, Daly (1940) 
454, Dantzig (1940) 455, Gumbel (1942) 
466, R. W. Jackson (1936) 471, Kolod- 
zieczyk (1933, 1935) 474, Neyman (19350, 
1938b) 480, (1942) 481, Neyman and Pear- 
son (1928, 193la, 1933a,c, 1936a, 1938) 
480, E. S. Pearson (1941, 1942a) 483, 
Pitman (1939) 486, Rietz (1938) 488, 
Scheffé (1942a, 1943) 490, Wald (1939a) 
497, (1941a) 498, Wilks (1935c, 1938a) 499, 
Wolfowitz (1942) 501. 

Statistical Review of England and Wales, data from, 
(Example 21.8) 120, (Example 21.9) 121. 

Stevens, W. L., test of significance in periodogram, 
434; N.R., 216. 

Stieltjes integrals, Bibl., Shohat (1930) 491. 

Stochastic convergence, 440. See Convergence in 
Probability. 

—— dependence, see Independence. 

processes, Bibl., Doob (1934a, 1937, 1938) 

457, Feller (1936a) 460. See Probability. 

Stock forecasting, Bibl., Cowles (1933) 453, Cowles 
and Jones (1937) 453. 

Stock, J. S., N.A., 266. 

Stratified sampling, 249-52. Bzbl.: P. H. Ander- 
son (1942) 443, Baker (1930c) 444, G. M. 
Brown (1933) 449, Frankel and Stock (1939) 
463, McKay (1934) 477, Mood (1943) 478. 
See also Sampling, miscellaneous, Repre- 
sentative Method. 

“Student ’’ (W. S. Gosset), see Gosset. 

Studentisation, 79-81, 134. Bzbl., Hartley (1938, 
1944) 467, Newman (1939) 480. 

“Student’s ” distribution, confidence intervals 
based on, 79-80; fiducial inference based 
on, 88; properties of, 100-2; in testing 
mean, 98-100 ; in non-normal case, 102-4 ; 
other uses, 104; in testing two means, 
109-10, 113-14; in testing Spearman’s p, 
124 ; in Pitman’s tests, 131, 132 ; in testing 
regressions, 156, 158, 172; in analysis of 
covariance, 244; (Example 26.9) 291. 

Bibl. : Bartlett (1935a) 445, C. C. Craig 
(1941a) 454, Daniels (1938a) 454, Fisher 
(1926a) 461, Geary (19366) 464, Hendricks 
(1936) 468, P. L. Hsu (1938a) 469, N. L. 
Johnson and Welch (1940a) 471, Kerrich 
(1937) 473, Kolodzieczyk (1933) 474, Lader- 
mann (1939) 474, McKay and others (1932) 
477, Merrington (1942) 478, A. N. K. Nair 
(1942) 479, Perlo (1933) 486, Rider (1929) 
488, Rietz (1939) 488, Steffensen (1936) 492, 
** Student ”’ (1908a, 1931la) 493, Treloar and 
Wilder (1934) 495. 

—— hypothesis, 285-7. 


Bibl., Neyman and 


519 


Tokarska (19366) 480, Przyborowski and 
Wilénski (1935a) 487. 

Stumpff, K., N.R., 437. 

Sufficient estimators, 7-12; given by maximum 
likelihood, 19; general form possessing, 
24-5; distribution of, 25; when range 
depends on parameter, 27-8; for several 
parameters, 39-40; giving minimum- 
variance estimators, 52; relation with 
confidence intervals, 74-5, 79; relation 
with U.M.P. tests, 281-2, with U.M.P.U. 
tests, 310. 

Bibl. : Bartlett (19366, 1937c, 1940) 445, 
Darmois (1935) 455, Koopman (1936) 474, 
Neyman (1935a) 480, Neyman and Pearson 
(1936a) 480, Pitman (1936) 486, Welch 
(1939a) 498. 

Sukhatme, P. V., tables for Behrens’ test, 92, 111; 
(Exercise 26.8) 305-6 ; sampling moments, 
440. N.R., 94, 266, 304. 

Sum, distribution of, see Means. 

Summation convention, 329. 

Sunspots, Bzbl., Schuster (1906) 490, Yule (1927) 
503. 

Symmetric functions, Bibl., O’Toole (1931, 1932) 
481. See Moments, k-statistics. 


T-distribution, see Hotelling’s T. 

Tabular differences, Bibl., Ladermann and Lowan 
(1939) 474. 

Tanburn, E., N.R., 137. 

Tang, P. C., linear hypotheses, 301; N.R., 303. 

Tchebycheff, P. L., (Exercise 22.4) 173 ; N.R., 172. 

Tchebycheff-Hermite polynomials, Bibl. : Doetsch 
(1934) 457, Erdélyi (1938) 459, Feldheim 
(19376) 460. See Gram-Charlier Series, 

, Orthogonal Polynomials. 

Tchebycheff’s inequality, Bibl.: Berge (1938) 
446, Bernstein (1937) 446, Camp (1922) 450, 
C. C. Craig (1933) 454, K. Pearson (1919) 
485, C. D. Smith (1930) 491. 

Tea-drinking, Bibl., Mahalanobis (1943) 476. 

Telephone service, Bibl., Newland and Neal (1939) 
479, Palm (1937) 482. 

Terminals of frequency-distribution, confidence 
intervals for, 83. 

Test construction, Bidl., Cureton and Dunlap 
(1938) 454. 

Tests of significance, see Significance, Statistical 
Hypotheses. 

Tetrachoric functions, Bibl. : J. Henderson (1922) 
468, K. Pearson (1912a, 1913a, 6) 484, K. 
Pearson and Heron (1913c) 484, Newbold 
(1925) 479, Pearson and Pearson (19226) 
485. 

Tetrad difference, (Exercise 28.10) 362. Budi, 
Hotelling (1936b) 469, Wilks (1932d) 499. 
See Factor Analysis. 


520 


Third moment, 
(1932) 486. 

Thompson, C., on A-tests, 299; N.R., 303. 

Thompson, W. R., (Exercise 19.5) 84; N.R., 83. 

Thomson, G., (Example 25.1) 258-9. 

Ties in ranking, 127, 441. 

Time-series, 363-439 ; examples of, 363-9 ; trend, 
371-8 ; effect of trend elimination, 378-87 ; 
variate difference method, 387-94; oscilla- 
tions, 397-9; tests for randomness, 399 ; 
types of oscillatory series, 395-402; serial 
correlations, 402-4; correlogram, 404-13 ; 
autoregressive schemes, 414-21; auto- 
correlation function, 421-3; periodogram 
analysis, 423-33 ; significance of a periodo- 
gram, 433-5; lag correlation, 435-7. 

Bibl. ; Bartels (1935) 445, Darmois (1929) 
455, Davis (1941) 455, Jones (19376, c) 472, 
Kendall (19440, b) 473, Koopmans (1937, 
1940, 1941) 474, Macaulay (1931) 476, 
Roos (1934, 1936) 489, von Szeliski (1929) 
497, Wallis and Moore (1941) 498, Wold 
(1938a) 501, Zaycoff (1936, 1937) 503. 

See also Correlogram, Harmonic Analysis, 
Periodicity. 

Tintner, G., variate-difference method, 393. N.R., 
394. 

Tokarska, B., N.R., 303. 

Tolerance limits, see Quality Control. 

Trade cycles, see Periodicity. 

Traffic signals, Bibl., Garwood (1940) 464. 

Transformation of distributions, Bibl.: Baker 
(1930a, 1934) 444, Beall (1942) 446, Bliss 
(1938) 447, Curtiss (1943) 454, Frankel and 
Hotelling (1938) 463, Landahl (1938) 474, 
Rietz (19316) 488, Tricomi (1938) 495, 
Yasukawa (1925) 501, Zoch (1934) 503. 

Transvariation, Bibl., Castellano (1934, 1937) 451. 

Travers, R. M. W., N.R., 359. 

Trend, 369-70, 371-87. Bibl.: Lorenz (1931, 
1935) 476, Macaulay (1931) 476, Rhodes 
(1921) 488, Sasuly (1934) 490, Schumann 
(1938) 490, Sipos (1930) 491, Working and 
Hotelling (1929) 501. 

Trough, in time-series, 124. 

Truncated normal distribution, Bvzbl., 
(1938) 473, Stevens (1937a) 493. 

Atiemeaere, lal, lat, Walt, cei. 

Turning-point, in time-series, 124. 

Two samples, Bibl.: Behrens (1929) 446, Dixon 
(1940) 456, P. L. Hsu (1938a) 469, Lengyel 
(1939) 475, Mathisen (1943) 477, E. S. 
Pearson (1929) 482, Pearson and Neyman 
(1930) 482, K. Pearson (1911a) 484, (1931a) 
485, Peek (1937) 486, Rhodes (1924, 1925) 
488, Romanovsky (1928) 489, Starkey 
(1938) 492, Sukhatme (1935, 19366) 493, 
Swaroop (1938) 494, W. R. Thompson 


distribution of, #B2bl., Pepper 


Keyfitz 


INDEX 


(1933) 494, Wald and Wolfowitz (1940c) 
498, Welch (1938a) 498, Yates (1939f) 
501. 

Type A, B, C, in statistical tests, 309-27. 

Type I distribution, (Exercise 17.17) 49. 

II distribution, Bbl., Carlson (1932) 451. 

—— III distribution, estimation of parameters 
in, (Example 17.8) 20-1, (Example 17.13) 
26, (Example 17.19) 39, (Example 18.3) 
53-4; sufficiency, (Example 17.21) 40; 
centre of location of, (Example 17.23) 42; 
confidence intervals for parameter (Example 
19.5) 74-5; fiducial distribution of para- 
meter, 87. Bibl.: C. C. Craig (1929a) 453, 
Kullback (1936a) 474, Olshen (1938) 481, 
Salvosa (1930) 490, Wicksell (1933) 499. 

— IV distribution, centre of location of, (Exer- 
cise 17.15) 48; intrinsic accuracy of, 
(Exercise 17.19) 49. 


Unbiassed estimators, 3-4; confidence intervals, 
76; tests, 309-27. 

Unequal subclasses, in variance-analysis, 220-4. 
Bibl.: Brandt (1933) 449, Wald (1940b) 
497, (1941d) 498, Wilks (1938e) 500, Yates 
(1934a) 501. 

Uniformly most powerful tests, 276; unbiassed 
tests, 309, N.R., 359. 

U-shaped distribution, Bibl., Holzinger and Church 
(1929) 469. 


Variability, measures of, Bibl. : Castellano (1935) 
451, de Vergottini (1936) 456, Galvani 
(1931) 464, Gini (1912, 1930) 465, March 
(1926) 477, Pietra (1932a) 486, Vinci (1920) 
496. } 

Variance, analysis of, see Analysis of Variance. 

, distribution and tests of, Bibl.; Baker 
(1931, 1932, 1935, 1940) 444, Church 
(1925, 1926) 452, A. T. Craig (1932, 1938) 
453, Dunlap (1931) 458, Fertig and Proehl 
(1937) 460, Greenwood and Greville (1939) 
466, Kondo (1930) 474, Le Roux (1931) 
475, K. Pearson (1931d) 486, Quensel 
(1938) 487, Rhodes (1927) 488, Rietz (1931a) 
488, Romanovsky (1925a) 489, Truksa 
(1940) 495, von Bortkiewicz (1922) 497, 
Yasukawa (1925) 501. See also Fisher’s 
Distribution, & samples. 
estimation of, Bzbl., O. L. Davies and 
Pearson (1934) 455, P. L. Hsu (1938) 469. 

ratio, Bibl. : S. S. Bose (1935) 448, Cochran 
(1941) 452, Finney (1938, 1941a) 460, 
Morgan (1939) 478, U. S. Nair (1941a, b) 
479, Scheffé (19426) 490. See also Fisher’s 
Distribution. 

~——, test of, in normal samples, 104; difference 
of two variances, 115, (Example 26.8) 289, 


INDEX 


Variate-difference method, 387-94. Bibl. : Ander- 
son (1914, 1923, 1926, 1929) 443, Cave- 
Browne-Cave (1904) 451, Cave and Pearson 
(1914) 451, Haavelmo (1941) 467, K. 
Pearson and Hiderton (1923a) 485, Robb 
(1929) 489, “‘ Student ” (1914) 493, Tintner 
(1935, 1940, 1941) 495, Zaycoff (1936, 1937) 
503. 

Variate transformations, in analysis of variance, 
206-9. See Transformation. 

Variation, coefficient of, Bibl.: Hendricks and 
Robey (1936) 468, McKay (1931) 477, 
McKay and others (1932) 477. 

Variety trials, Bibl., Yates (1936d, 1937a) 502. 

Vector correlation, alienation coefficients, (Exer- 

cises 28.8, 28.9, 28.10) 361-2. 
representation of a sample, Bzbdl., Bartlett 
(19345) 445. 

von Mises, R., w?-test, 108; Irregular Kollektiv, 

123. 


Wald, <A., most-selective confidence intervals. 
82-3; limiting properties of tests, 322, 
N.R., 83, 304, 326. 

Walker, Sir Gilbert, time-series, 420; significance 
of a periodogram, 434. 

Wallace, N., N.R., 359. 

Wallis, W. A., phases in time-series, 126, N.R., 136. 

Water-content in samples, (Example 23.3) 190-4, 
(Example 23.4) 196-8. 

Weather, effect on moths, (Example 22.10) 171-2. 

Welch, B. L., difference of two means, 112, 
(Example 21.6) 113; (Exercise 21.7) 139 ; 
Latin squares, 261; footnote 295. N.L., 
45, 83, 216, 304, 359. 

Wheat-price index (of Sir William Beveridge), 
(Table 30.1) 396, (Example 30.4, Table 30.6, 
Figure 30.5) 409-10; (Table 30.9 and 
Figure 30.9) 425-30; (Example 30.5) 
431-2; (Example 30.10) 435. 

Wheat prices, and horse population, (Table 30.10) 
436. 


521 


Whittaker, Sir Edmund, periodogram (Exercise 
30.10) 439, Calculus of Observations, N.R., 
394, 437. 

Wicksell, S. D., theorem on regressions, 148; 
(Example 22.2) 144; (Exercises 22.1, 22.2, 
ea) WB IN, LPR, Sy 

Wiener, N., autocorrelation function, 422. 

Wilks, S. S8., shortest confidence intervals, 82; 
d-tests, 299; Hotelling’s 7, 337-8; dis- 
tribution of means, 341, 358; (Exercise 
19.1) 83, (Exercise 19.4) 84, (Exercises 
28.4, 28.5) 360. N.R., 83, 245, 303, 304, 
359. 

Wilsdon, B. H., N.R., 245. 

Wilson-Hilferty transformation of y?, 118. 

Wishart, J., (Exercise 24.3) 246, (Exercises 28.1, 
28.2) 359-60. N.R., 245, 359. 

Wishart’s distribution, 330-5, 337-8, (Exercise 
28.3) 360. Bibl.: P. L. Hsu (1939a) 469, 
Ingham (1933) 470, Wishart (1928) 500, 
Wishart and Bartlett (1933c) 500. 

Wold, H., w?-test, 108; (Exercise 25.3) 267; 
time-series, 418; Carleman criterion, 440. 
N.R., 266, 437. 

Wolfowitz, J., confidence intervals for terminals 
of a distribution, 83. N.R., 304. 

Woodbury, M., tied ranks, 441. 

Wool thread, weights of, (Example 23.2) 183-5. 


Yates, F., tables of t, 102; (Example 23.5) 200-2; 
z-distribution, 206; (Example 23.8) 214; 
(Example 24.1) 221-5; (Example 24.5) 
230-3 ; design of experiments, 263. N.R., 
94, 216, 245, 266. 

Yule, G. U., autoregressive series, 418; (Exercises 
30.3 and 30.9) 439. N.R., 394, 437. 


Zaycoff, R., variate-difference method, 393. N.R., 
394. 
z-distribution, see Fisher’s Distribution. 


AFR 
n Libr 
FOND podiat Uni 
Geuthern ow yk & 


Southern Methodist ab > Sci, dew | 


‘wi iia | 


DATE DUE 


f\ 


See | 
ae -_ ee 


GAYLORD 


PRINTEDINU-S.A, 


SCIENCE 


jatage 1 BRARY 
Southern” Methodist University 
DALLAS 5, TEXAS 


hug dan chine 


a or 
Ge UA Sori EF Uy 
hh AebhThoh heh aaah retake 
teeeapeneprbotensa ress 
OE ree BHD yg Lh dar 
habe he een? he tale 

S Gash toonaik okie fetes eet 
rereiy 


Caer 

ON itt Fe 

Adve ture 
RFE My APA ad 


Se WY Be mee 
. Uwe babe 


oan donde 


an SOW rend sane Ware ke Ae Wn ly 


Oy etle Omi U CEL Ube 
iron) Heautendtent is 
& 


ey 


lS Ene ty 


DERE NOTE 
Lie owed 
"OEE Ms We ek ae 
UL Wy tary ee be) 
te eat ind 


Se 
Onn ee tte " 
br pavt ect ait 
isis 
A ue =e es Th 


ou git 
Perens’ 


fad ERUAaQ Con 


mek er gee’ 


we 


Ve tren eo Ow 


Fetk 
Myer 


Hip Ly ws aay dds 


Saat 


On 


wi 
“i 
ii 


rireyen 
Neen ett 
er 

be 


Fe 
is} 


O) 


us ble 


boron tere ievearny 


Abed et a 
eater rr Car eat 


a thw she 0 
She ate bose) Bde ds & & 
Lote Get 


ee Dy 
Ne i ele Oop tek 
oh Sis cans 


Leheth ile ga de 
sail Oi On ne Colle tk 


en ea 
\ Ae 
envi trate 

na hee = 


Wists ane 
Fi et 
” 


my 


sate 


Bho: Ub We VA as 
a ee 3 
Ute 


ne 
meining 
bee 


he 


RT 


eed sin eae 
wabiot , 


Se Died te 
ey elaeeeiadedt Le acd 
Oey yy 


Pile B48- ith y- 
APM doe M 


rn are 


harps havoc 
DW Arion d ot cae 9 Hotton pit shat BAe ANE 
tobe ona s+ 


- 6 An owe ve Pap Dab PF Pb 
err he Ee 200 naa ch drh tres ongerenrt 
erry rey Wired io 
EY OR rx eerene 
re Ny 
Bux 


late a) Ske pmo WD EP nae 


tyeney yc reee rir er Tyner ee 
os) 


SPiPlDe 4 
Ot Pew dedo9. 
Se eae 


a 


Reich Penna 
2 pyr We PRD PERE PSE 4 
oe Vb Bh dake 


wey 


oe 
a 
et Ae 
a eee av 


fore core mire DIUM rn ioT 


Bit 3 935-0). 


ae LY 
Creer yay 


oy 
SARA $e 4AM ne 
i Avian, » 


rhs) 


ae 


+ 
ore 
Pee yh) 


ee 


Hy 


ve 
aid 
* 


cy 


SOP Ae ri 


vid aia 


eos 
taraasehea 0 


78 


i) 
u 
yt 


